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Abstract. The Schelling model of segregation looks to explain the way in which a population of agents 
or particles of two types may come to organise itself into large homogeneous clusters, and can be seen as a 
variant of the Ising model in which the system is subjected to rapid cooling. While the model has been very 
extensively studied, the unperturbed (noiseless) version has largely resisted rigorous analysis, with most 
results in the literature pertaining to versions of the model in which noise is introduced into the dynamics 
so as to make it amenable to standard techniques from statistical mechanics or stochastic evolutionary game 
theory. 

We rigorously analyse the one-dimensional version of the model in which one of the two types is in the 
minority, and establish various forms of threshold behaviour. Our results are in sharp contrast with the case 
when the distribution of the two types is uniform (i.e. each agent has equal chance of being of each type in 
the initial configuration), which was studied in [BIKK12, BELP14]. 
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1 Introduction 


The economist Thomas Schelling introduced his model of segregation in [Sch69] (developed later in 
[Sch71a, Sch71b]), with the explicit intention of explaining the phenomenon of racial segregation in large 
cities. Perhaps the earliest agent-based model studied by economists, since then it has become an archetype 
of agent-based modelling, prominently featuring in libraries of modelling software tools such as NetL- 
ogo [Wil99] and often being the subject of experimental analysis and simulations in the modeling and 
AI communities [CMGP13, CM11, Fos06, GB02, Sch07, YO09, HCSB11, dSGL07, EA96]. Many ver¬ 
sions of the model have been analysed theoretically, from a number of different viewpoints and dis¬ 
ciplines: statistical mechanics [DCM08, CFL09] and [Berl2, Section 3.1], evolutionary game theory 
[You98, Zha04a, Zha04b, Zhall] the social sciences [CF08, Cla91, SSDOO], and more recently computer 
science and AI [CACP07, BMR14, BIKK12, BELP14]. It was observed in [BIKK12], however, that de¬ 
spite the vast amount of work that has been done on the Schelling model in the last 40 years, rigorous 
mathematical analyses in the previous literature generally concern altered versions of the model, in which 
noise is introduced in the dynamics, i.e. where one allows that agents may make non-rational decisions that 
are detrimental to their welfare with small probability. The introduction of such ‘perturbations’ may be 
justifiable from a ‘bounded rationality’ standpoint. 

The model (which will be formally defined shortly) concerns a population of agents arranged geographi¬ 
cally, each being of one of two types. Each agent has a certain neighbourhood around them that they are 
concerned with, and also an intolerance parameter r e [0,1] which we shall assume here to be the same for 
all agents. An agent’s behaviour is dictated by the proportion of the agents in their neighbourhood which 
are of its own type. So long as this proportion is > r the agent may be considered ‘happy’ and will not 
move. Starting with a random configuration, one then considers a discrete time dynamical process. At 
each stage unhappy agents may be given the opportunity to move, swapping positions with another agent, 
so as to increase the proportion of their own type within their - neighbourhood. Now one might justify a 
perturbed version of these dynamics, in which agents will occasionally move in such a way as to decrease 
their utility (i.e. the proportion of their own type within their neighbourhood) by arguing, for example, that 
it is reasonable to suppose that only incomplete information about the make-up of each neighbourhood is 
available to the agents. It is a fact, however, that 

(a) the methods used for the analysis of the perturbed models do not apply to the unperturbed model; 

(b) the segregation that occurs in the perturbed models is often very different than in the unperturbed 
model. 

In the unperturbed models the underlying Markov chain does not have the regularities that are found in 
the perturbed case (e.g. the Markov process is irreversible). The presence of a large variety of absorbing 
states means that entirely different and more combinatorial methods are now required. Beyond the basic 
aim of a rigorous analysis for these unperturbed models, which have been so extensively studied via sim¬ 
ulations, further motivation is provided by the fact that the Schelling model is part of a large family of 
models, arising in a broad variety of contexts—spin glass models, Hopfield nets, cascading phenomena as 
studied by those in the networks community—all of which look to understand the discrete time dynamics 
of competing populations on underlying network structures of one kind or another, and for many of which 
the unperturbed dynamics are of significant interest. The hope is that techniques developed in analysing 
unperturbed Schelling segregation may pave the way for similar analyses in these variants of the model. 

The first rigorous analysis of an unperturbed Schelling model was described by Brandt, Immorlica, Kamath, 
and Kleinberg in [BIKK12]. In this work it was also demonstrated that the eventual state of the process 
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Table 1: Parameters of the Schelling model and the main result. 


Parameter 

Symbol 

Range 

Process parameters 

Segregation 

Population 

n 

N 

T < A 0 

& 

p < do 

Negligible 

Neighbourhood radius 

w 

[0, n] 

T < Kq 

& 

p < 0.5 

Negligible 

Tolerance threshold 

T 

[0,1] 

t < 0.5 

& 

p < 0.25 

Negligible 

Expected/Actual minority proportion 

pip* 

[0,1] 

t > 0.5 

& 

p < 0.5 

Complete 


differs significantly from the stochastically stable states of the perturbed models. This study focused on the 
one-dimensional Schelling model and provided an asymptotic analysis, in the sense that the results hold 
with arbitrarily high probability for all sufficiently large neighbourhoods and population. More signifi¬ 
cantly, however, it dealt only with the symmetric case where intolerance parameter r = 0.5 (i.e. an agent is 
happy when at least 50% of the agents in its neighbourhood are of its own type). In [BELP14] a much more 
general analysis of the unperturbed one-dimensional Schelling model for re [0,1] was provided. In fact 
it was shown there that various forms of surprising threshold behaviour exist. A significant symmetry as¬ 
sumption underlying the results in [BIKK12, BELP14] is that the populations of the two types of agents are 
assumed to be uniform (i.e. each agent has equal chance of being of each type in the initial configuration). 
Indeed, there is no rigorous study of the unperturbed spacial proximity model with swapping agents for the 
rather realistic case where the distribution of the two types of agents is skewed. In fact, the question as to 
what type of segregation occurs with a skewed population distribution was raised by Brandt, Immorlica, 
Kamath, and Kleinberg in [BIKK12, Section 4] as well as in popular expositions of the Schelling model 
like [Hay 13]. 

The purpose of the present work is to give an answer to this question. We show that complete segregation 
is the likely outcome if and only if the intolerance parameter is larger than 0.5. Moreover in the case that 
the minority type is at most 25%, there is a dichotomy between complete segregation and almost complete 
absence of segregation. 


1.1 Definition of the model 

Schelling’s model of residential segregation belongs to a large family of agent-based models, where a 
system of competitive agents perform actions in order to increase their personal welfare, while possibly 
decreasing the welfare of other individuals. This phenomenon roughly corresponds to the so-called sponta¬ 
neous order approach 1 in economics literature, which studies the emergence of norms from the endogenous 
agreements among rational individuals. 

The Schelling model that we study is a direct generalisation of that in [BIKK12] and also that studied by 
the authors in [BELP14], The one-dimensional model with parameters n,w,T,p (as listed in Table 1) is 
defined as follows. We consider n individuals which occupy an equal number of sites 01 (ordered 
clockwise) on a circle. Each of the individuals belongs to one of the two types a and />’. The type assignment 
of individuals is independent and identically distributed (i.i.d.), with each individual having probability p 
of being type /3. Without loss of generality we always assume that p < 0.5, i.e. that the individuals of type 

'This contrasts the mechanism design approach which studies the exogenous (a priori) design of regulations in order to achieve 
desired properties in a system of interacting agents. 
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Figure 2: Threshold behaviour when r,p are in [0,0.5]. The two-dimensional axes refer to r and p. In the first figure, the process is static except 
for the (r,p) in the small area at the top right corner. The second figure is a plot of P sta b and P un hap (for w=100) as functions of (r,p). The third 
figure is a plot of g(r,p) for w=100. 


(3 are the expected minority (so long as p + 0.5). This random type assignment takes place at stage 0 of 
the process, and defines the initial state. At the end of stage 0, we let p* be the actual proportion of the 
individuals that are of type / 3. 

Unless stated otherwise, addition and subtraction on indices for sites are performed modulo n. Given two 
sites u, v in any configuration of the individuals on the circle, the interval [u, v ] consists of the individuals 
that occupy sites between u and (n + v) mod n (inclusive). For example, if 0 < v < u < n then we 
let [n,v] denote the set of nodes [u,n - 1] U [0, v] (while [v, u] is, of course, understood in the standard 
way). When we talk about a particular configuration, we identify each individual with the site it occupies, 
referring to both entities as a node. The neighbourhood of node u consists of the interval \ (u - w), (u + w) | 
where w is a parameter of the model that we call the (neighbourhood) radius. The tolerance threshold 
t e (0,1) is another parameter of the model that reflects how tolerant a node is to nodes of different type in 
its neighbourhood. We say that a node is happy if the proportion of the nodes in its neighbourhood which 
are of its own type is at least r. 

Given the initial type assignment (colouring) of the nodes, the Schelling process then evolves dynamically 
in stages as follows. At each stage s>0we pick uniformly at random a pair of unhappy nodes of different 
type, and we swap them provided that in both cases the number of nodes of the same type in the new 
neighbourhood is at least that in the original neighbourhood. If at some stage there are no further legal 
swaps the process terminates. If at some stage all nodes of the same type are grouped into a single block, 
we say that at that stage we have complete segregation. 

This completes the definition of the Schelling process with parameters n, w, r and p, which we denote 
by the tuple ( n,w,r,p ). The process can be seen as a Markov chain with 2" states corresponding to the 
configurations that we get by varying the type of each node between a and />’. A state is called dormant if 
either all a-nodes are happy, or all yS-nodes are happy. We shall be interested in the case that w is large, 
and that n is large compared to w. In this context it will turn out that the absorbing states of the Schelling 
process are exactly the dormant states and, in fact, the only recurrence classes of the Schelling process are 
the dormant states and complete segregation. Note that the number of nodes of type a and of type [i does 
not change between transitions, once the initial state has been chosen. 

1.2 Our results 

Given the Schelling process ( n , w, r,p) we wish to determine with high probability the type of equilibrium 
that will eventually occur in the system. Moreover, we are interested in asymptotic results, i.e. statements 
that hold with arbitrarily high probability for all sufficiently large w and all sufficiently large n compared to 
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Table 3: Metrics of welfare and critical stages in the unbalanced happiness process, as stopping times for certain conditions. 


Metric 

Symbol 

Dynamics 

Stage 

Stopping time for 

Social welfare 

V 

Positive (strictly if r < 0.5) 

T g 

G, > rp • n/(4w) 

Mixing index 

MIX 

Negative (strictly if r < 0.5) 

Ty 

Y s < G s , s < T g 

No. of unhappy nodes 

u 

Approximately negative if r > 0.5 

T 

1 mix 

mix > n(w + 1 )rp* 

-"- a-nodes 

u ff 

Ambiguous 

T 

i stop 

Q 

V 

o 


w. We denote this quantification on w,n by ‘0 « w and w n’ respectively (and write ‘0 « w « n for 
the combined statement). The following definition encapsulates the type of asymptotic statements about 
the Schelling process (n, w, r,p) that we are interested in establishing. 

Definition 1.1 (Properties with high probability and static processes). Suppose that R is a property which 
may or may not be satisfied by any given run of the Schelling process (n, w, T,p), and T is a property of the 
parameters r, p. By the sentence “if T (r, p), then with high probability R(n, w,T,p ) ” we mean that, provided 
that t, p satisfy T, for every e > 0 and all w » I / e, n » w the process (n, w,T,p ) satisfies R with probability 
at least I - e. We say that the process (n, w, r,p) is static if given e > 0, with high probability the number 
of nodes that ever change their type in the entire duration of the process is < e ■ n. 

By [BIKK12, BELP14], the asymptotic behaviour of the process ( n,w,T,p ) is known for p = 0.5 (except 
perhaps on the threshold t — kq ~ 0.353). The present work is dedicated to the case where one type of 
node is the minority, i.e. when p < 0.5. We show that with probability 1 the process will either reach 
complete segregation or reach a dormant state. Complete segregation is, strictly speaking, a a recurrence 
class of the process, consisting of the rotations of the two blocks, one consisting of all the a-nodes and the 
other consisting of all the yS-nodes. Hence, modulo symmetries, we may regard complete segregation as an 
absorbing state. Dormant states are a different kind of absorbing state, as the process actually stops when 
it hits a dormant state. We show that when r > 0.5 the highly probable outcome is complete segregation. 
Moreover, in many cases when r < 0.5 the outcome is negligible segregation (i.e. the process is static). Let 
kq ~ 0.353 and To ~ 0.4115 be the unique solutions of (0.5 - x)°' 5_JC = (1 - x) x ~ x and 2r • (0.5 - r) 1_2r = 
(1 - r) 2(1 ~ r ) respectively in [0,0.5]. 

Theorem 1.2 (Main result). If t > 0.5, p < 0.5 and t + f> T 1, then with high probability the Schelling 
process (n, w, r,p) reaches complete segregation. The process (n, w, r,p) is static (with high probability) if 

[r < To & p < To] or [r </<•()& p < 0.5] or [r < 0.5 & p < 0.25] 

or, more generally, if 2p • (I - 2kq) + r + kq < 1. 

The values of (r,p) for which we show that the process is static, correspond to the yellow area of the first 
diagram (or, equivalently, the collapsed part of the surface of the third diagram) of Figure 2. The case when 
p < 0.25 presents a remarkable contrast as r crosses the boundary of 0.5. In this case, when r exceeds the 
threshold 0.5, the process changes from static to the other extreme of complete segregation. 

Corollary 1.3 (Phase transition on 0.5). If p < 0.25, then with high probability the process (n, w, r,p) 


• converges to complete segregation ifr > 0.5; 

• is static, ifr < 0.5. 
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Table 4: Two cases for the process ( n , w, r,p) and the corresponding expectations of the number of initially happy nodes. 



Moreover with high probability it reaches its final state in time o in), if r < 0.5 and time f \n), if t > 0.5. 

We display these results in the second item of Table 1. In Sections 2-4 we present the argument that 
proves these results. This argument uses a number of smaller results which are stated without proof, and 
are the building blocks of the proof of Theorem 1.2. It is our intention that the reader gets a fairly good 
understanding of our analysis in this part of the paper, without the burden of having to verify some of the 
more technical parts of the proof. Section 5 is an appendix with detailed proofs of all the facts that were 
used in Sections 2-4, and completes the proofs of Theorem 1.2 and Corollary 1.3. 

Our proof of Theorem is nonuniform, and the analysis is roughly divided in the two cases displayed in Table 
4: balanced and unbalanced happiness. Here happiness refers to the numbers of initially happy nodes of 
the two types, and determines the dynamics that drives the process to an equilibrium. Of the two cases, 
unbalanced happiness is the most challenging to deal with, and the dynamics is driven by small number 
of unhappy cr-nodes against the large number of unhappy /1-nodes, which in fact is preserved throughout a 
significant part of the process. 


2 Metrics and reaching complete segregation 

One of the most challenging problems in the analysis of the segregation process is the large number of 
absorbing states. In order to understand which transitions are possible, we use certain metrics that describe 
the current state. 

2.1 Welfare, mixing, and expectations 

We define global metrics that reflect the welfare of the entire population. An obvious choice is the number 
of happy nodes at a given state. It is not hard to devise transitions of the process which reduce the total 
number of happy nodes (see the second plot of Figure 5). However it is possible to show that if r > 0.5 
the total number of happy nodes is approximately non-decreasing (in the sense that it is 0(g) for some 
nondecreasing function g on the stages, where the underlying constant depends only on w). Let the utility 
of a node (at a certain state) be the number of nodes of the same type in its neighborhood. A better behaved 
global metric of welfare of a state is the sum of the utilities of the nodes in the state. We call this parameter 
the social welfare of the state and denote it by V. A consequence of the transition rule and the definition of 
utility is that the social welfare does not decrease along the stages of the process. Furthermore, if r < 0.5, 
every transition of the process strictly increases the social welfare. Let the mixing index of a node be the 
number of nodes in its neighbourhood that are of different type. The mixing index mix of a state is the sum 
of the mixing indices of the a-nodes in that state. The mixing index of a state is also equal to the sum of 
the mixing indices of the /i-nodes in that state. The relationship between the two metrics is 

V = (2w + 1) ■ n - 2 ■ mix. 
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Stages Stages 

Figure 5: The first plot is from the process (200000,50,0.6,0.3) and the second one from the process (1000,20,0.6,0.3). These simulations 
illustrate that the number tr-nodes in the infected area remains bounded, until the number of /?-nodes outside the infected area becomes small. The 
second figure also illustrates the fact that the number of unhappy nodes fluctuates locally. 


Hence the mixing index is non-increasing along the transitions. Note that a single swap cannot decrease 
the mixing index by more than 4 w. On the other hand, by linearity of expectation we can calculate that 

the expectation of the mixing index in the initial state of (n, w, r,p) is 2nwp( I -p). 

The mixing index of complete segregation (in nontrivial cases) is w(w +1). Since p < 1/2, this means that 
(with high probability) the process can reach complete segregation only after (np-(w+ 1))/4 > np/5 stages, 
i.e. Q(«) stages. On the other hand, a case analysis shows that if r < 0.5, each step in the process decreases 
the mixing index by at least 4. This means that if r < 0.5 and the process is static, then it reaches its final 
state within o («) stages. This happens because each time a swap occurs, the mixing index decreases by at 
least 4 (so its not possible that the same few nodes swap more than o ( n ) times). We have shown that the 
second clause of Corollary 1.3 (concerning the time to the final state) follows from the first clause. 

As another measure of mixing, we may consider the number of maximal /j-blocks in the state. These are 
the contiguous /j-blocks that are maximal, in the sense that they cannot be extended to a larger contiguous 
/j-block. Let U be the number of unhappy nodes in a state. It is not hard to show that if r > 0.5 then 
mix = @(U) = ©(kyj) and in particular 

MIX < w ■ (w + 1) • k ^3 < w ■ (w + 1) • U < MIX • 2w/(l - t). (2.1.1) 

This means that the number of unhappy nodes at a certain state reflects the progress of the process towards 
segregation. More precisely, the metrics mix, k^, U are mutually proportional when r > 0.5, where the 
analogy coefficient depends on w (see Figure 5). In Table 3 we display these global metrics of welfare, 
along with their dynamics. A function (on the stages of the process) has positive dynamics if it is non¬ 
decreasing and approximately positive dynamics if it is 0(g) for some nondecreasing function g, where the 
multiplicative constant does not depend on n. Similar definitions apply for ‘negative’. The first clause of 
Theorem 1.2 (the case when r > 0.5) is the hardest to prove. It turns out that in this case we can deduce a 
non-trivial lower bound on the mixing index of dormant states. 

Lemma 2.1 (Mixing in dormant states). Consider the process (n, w, r,p) with t > 0.5. The mixing index in 
a dormant state is more than n(w + l)rp*, as long as w > 1 /(2r - 1). 

The case r > 0.5 is further divided in two cases, which reflect the proportions of happy nodes in the initial 
state. We display these in Table 4, along with the corresponding expectations for the numbers of happy 
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Figure 6: The path to a dormant state or complete segregation when r > 0.5. 


nodes of each type. Lemma 2.1 is crucial for the proof of the first clause of Theorem 1.2 (in particular the 
case t + p < 1). 

2.2 Accessibility of dormant states and complete segregation 

We show the case of Theorem 1.2 where t > 0.5 and r + p > 1. This argument consists of two parts. 
First, we show that in this case with high probability the initial state is such that every state with the same 
number of a-nodes has unhappy nodes of both types (i.e. it is not dormant). Hence under these conditions, 
no accessible state is dormant. The second part consists of showing that from every state there is a sequence 
of transitions to either a dormant state or complete segregation. Moreover the latter fact holds in general, 
for any values of r,p, so it can be reused for the case when t + p < 1, in Section 3. This latter case is more 
challenging, as it can be seen that there are permutations of the initial state which are dormant. 

Lemma 2.2 (Existence of unhappy nodes). Suppose that r > 0.5, p* < r and w is sufficiently large. Then 
for every c € N and all sufficiently large n, every state of the process ( n,w , r,p) has more than c unhappy 
ffinodes. If in addition r + p* > 1, every state also has more than c unhappy a-nodes. 

Given p, by the law of large numbers with high probability (tending to 1, as n tends to infinity) p* will be 
arbitrarily close to p. Hence we may deduce the absence of dormant states (with high probability) in the 
case that t + p > 1. 

Corollary 2.3 (Absence of dormant states when r > 0.5 and r + p > 1). If p < 0.5 < r and r + p > 1 then 
with high probability none of the accessible states of the process (n , w, r,p) is dormant. 

It remains to show the accessibility of either a dormant state or complete segregation, from any state of the 
process. An inductive argument can be used in order to prove this fact. 

Lemma 2.4 (Complete segregation or dormant state). From any state of the process (n, w, r,p) with 0 <s 
w <sc n there exists a series of transitions to complete segregation or to a dormant state. 

Here is a sketch of the proof. If r < 0.5 the mixing index is strictly decreasing through the transitions, so it 
is immediate that the process will reach a dormant state (indeed, 0 is a lower bound for the mixing index). 
For the case where r > 0.5 (which we assume for the duration of this discussion) we can argue inductively, 
in four steps. An interval of nodes of the same type is called a contiguous block. First we show that from 
a stage with few unhappy nodes of one type (here 5 w 4 is a convenient upper bound of what we mean by 
‘few’, which is by no means optimal) there is a series of transitions which lead to either a state with a 
contiguous block of length 2 w or a dormant state. Second, from a state with a contiguous block of length 
> 2iv there is a series of transitions to complete segregation or to a dormant state. Third, any state which 
has at least w 4 unhappy nodes of each type, there is a series of transitions to a state with a contiguous block 
of length at least w. Finally from a state that has a contiguous block of length > w and at least 4 w unhappy 
nodes of opposite type from the block, there is a series of transitions to a state with a contiguous block of 














Figure 7: The evolution of the infected area when r + p < 1. 


length > 2 w. The combination of these four statements constitutes a strategy for arriving to a dormant state 
or a state of complete segregation, from any given state. We illustrate this strategy in Figure 6, where two 
arrows leaving a node indicate that at least one of these routes are possible. 


3 Reaching complete segregation when r > 0.5 and r + p < 1 

This case of Theorem 1.2 is challenging because we need to show that the process avoids accessible dormant 
states, until it reaches a safe state i.e. a state from which no dormant state is accessible. The reason for this 
avoidance is (in contrast with the case t + p > 1 of Section 2.2) the dynamics of the process with the given 
parameters. The methodology we use is based on a martingale argument, which involves a great deal of the 
analytical tools (e.g. the metrics of social welfare) and their properties that were developed in the previous 
sections. Flaving shown that dormant states are avoided until the process reaches a safe state, Lemma 2.4 
gives Theorem 1.2 (for the case where r > 0.5 and r + p < 1). An overview of this argument is given in 
Figure 8. 


3.1 The persistence of large contiguous ( 3 -blocks 

According to our plan, we wish to establish the existence of unhappy nodes of both types until a safe state 
is reached. By Lemma 2.2, we do not have to worry about the existence of unhappy /i-nodes. One device 
that guaranties the existence of unhappy a-nodes is a contiguous block of /3-nodes, of length at least w. 
Such a block exists in the initial random state (with high probability). One way to argue for its preservation 
in subsequent stages is to consider the ratio of the unhappy nodes of the two types. Even more relevant is 
the ratio between the number of unhappy a-nodes, and the number of //-nodes which are not just unhappy, 
but actually sufficiently unhappy that they can swap with any unhappy a-node. 

Definition 3.1 (Very unhappy /1-nodes). Given a stage of the process, a node of type (3 is very unhappy if 
there are at least (2w + l)r nodes of type a in its neighbourhood. The number of very unhappy /3-nodes is 
denoted by Mj,. 

In the case that we study (r > 0.5 and t + p < 1) initially, the number of very unhappy /1-nodes is Q(n) 
while the number of unhappy a-nodes is o ( n ). The following lemma says that as long as this imbalance is 
preserved, it is very likely that a sufficiently long contiguous block of /1-nodes is preserved. 

Lemma 3.2 (Persistent /1-block). Consider the process (n, w, t, p) with r > 0.5 and let s * be the least stage 
where the ratio between the very unhappy / 3-nodes and the unhappy a-nodes becomes less than 4w 2 (putting 
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Figure 8: The logic of the proof that if r > 0.5, with high probability the process reaches complete segregation. 


.v* = oo if no such stage exists). Then with high probability there is a / 3-block of length > 2w at all stages 
< s * of the process. 

Since a /3-block of length at least w is a guarantee for unhappy a-nodes, we get the following corollary. 

Corollary 3.3 (Conditional existence of unhappy a-nodes). Under the hypotheses of Lemma 3.2, with high 
probability there are unhappy a-nodes at all stages < s t of the process. 

It remains to construct an elaborate martingale argument in order to show that the imbalance between U (i , 
and U* persists for a sufficiently long time (until the process reaches a safe state). 

3.2 Infected area view of the Schelling process 

In the case of unbalanced happiness (i.e. when r > 0.5, t + p < 1, see Table 4) the unhappy a-nodes 
are initially very rare, so the interesting activity (namely a-to-/3 swaps) occurs in small intervals of the 
entire population (at least in the early stages). These intervals contain the unhappy a-nodes, and gradually 
expand, while outside these intervals all /3-nodes are very unhappy. Figure 11 shows the development of 
this process, where the height of the nodes (perpendicular lines) is proportional to the number of a-nodes in 
their neighborhood and the horizontal black line denotes the threshold where an a-node becomes unhappy. 
Hence nodes with high proportion of a-nodes in then - neighbourhood will be higher than the nodes with low 
proportion of a-nodes in their neighbourhood. The three horizontal bars arc snapshots of the process, and 
show cascades forming, originating from the initially unhappy a-nodes. Figure 7 shows the same process, 
with the current state in the outer circle, and with swaps represented by a dot at a distance from the center 
which is proportional to the stage where the swap occurred. These cascades that spread the unhappy a- 
nodes are due to the following domino effect. An unhappy a-node moves out of a neighbourhood, thus 
reducing the number of a-nodes in that interval. This in turn often makes another a-node in the interval 
unhappy, which can move out at a latter stage, thus causing another a-node nearby to be unhappy, and so 
on. The expanding intervals are the infected segments which start their life as incubators. For the sake of 
simplicity, we omit the formal definitions of these notions, which can be found in the appendix. Roughly 
speaking, incubators are a small intervals that surround the unhappy a-nodes in the initial state. Moreover 
they are defined in such a way that, every /3-node that is outside the incubators is very unhappy in the initial 
state. During the process, as we discussed above, these expand into larger infected segments, so that at each 
stage every unhappy a-node is inside an infected segment. The union of all infected segments is called 
the infected area. At any stage, every /3-node outside the infected area is very unhappy and every a-node 
outside the infected area is happy. It is not hard to show that if r +p < 1, the probability that a node belongs 
to an incubator is e~®^ w \ Hence with high probability the number of incubators as well as the number of 
nodes belonging to incubators of the process (n,w, r,p) is ne~ & ^ w \ 

It turns out that the number of unhappy /3-nodes in an interval of nodes, is conveniently bounded in terms 
of the number of a-nodes in the interval. This means that if the number of a-nodes in the infected area 
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Table 9: Random variables indicating the number of certain nodes in infected area at stage s of the process. 


Z. s 

d-nodes in infected area 

P s 

Probability of a bogus swap 

Y s 

Unhappy /3-nodes in infected area 

G s 

/3-nodes outside infected area. 

D, 

Anomalous nodes in infected area 

C 

Nodes inside the incubators 


remains o ( n ), then the number of unhappy /3-nodes in the infected area also remains o (n). In order to give 
a clear sketch of the argument depicted in Figure 8 (for the current case when r > 0.5 and r + p < 1) let 
us define the global variables in Table 9 (for the current discussion we will not be concerned with D s or its 
definition). Note that Since U s < G v + Y. s + Z s . A combinatorial argument can be used in order to show 
that Y s < Z s /(l - t) + 2wC. Hence 


U s < wC + G, + 2Z f /(l - r). (3.2.1) 

By (2.1.1) we know that a stage where the number of unhappy nodes is less than nrp*/ w is a safe stage. 
Hence we wish to show that (with high probability) the process will arrive at a stage where each of the 
three summands in (3.2.1) are at most nrp*/(3w). We know that C can be bounded appropriately. Our main 
argument will show how to obtain a similar bound for Z s . Note that G s plays a different role, since it is 
initially large and shrinks monotonically (as the infected area expands monotonically). In order to find a 
stage where G v becomes sufficiently small, it is instructive to consider what is a typical swap in the process. 
At the start of the process the infected area is a very small proportion of the entire ring. The vast majority 
of unhappy /3-nodes occur outside the infected area, while all unhappy cr-nodes are inside the infected area. 
It follows that with high probability a swap will involve an rr-nodc in the infected area and a /3-node outside 
the infected area. A bogus swap is a swap is one that is not of this kind. 

Definition 3.4 (Bogus swaps). A swap which involves a /3-node currently inside the infected area is called 
bogus. Given an infected segment I, a bogus swap in I is a swap that moves an a-node into I. 

Note that any swap which is not bogus, reduces G s by at least 1. Hence if we show that the bogus swaps have 
small probability throughout a significant part of the process, we can ensure that G v becomes sufficiently 
small. In order to be more precise, recall the stopping time s * from Lemma 3.2. We introduce a few more 
stopping times, all of which will turn out to be earlier than s * (with high probability). These basically 
concern the satisfaction of conditions which will ensure that the mixing index is sufficiently low as to 
guarantee a safe state. By (2.1.1) we have mix < U • w{w +1) and in order to ensure a safe state (by Lemma 
2.1) we want mix < n(w + 1 )rp*. So we want U < nrp*/w at some stage of the process. Let T m j x be the first 
stage which satisfies this condition. Similarly, consider the stopping times T g , T stop of the second part of 
Table 3 (for simplicity, we will not consider T y in the present discussion). We use an elaborate martingale 
argument in order to show the following. 

Lemma 3.5 (Bounding the a-nodes in the infected area). If t > 0.5 and r+p < 1, with high probability we 
have Z s = o (n) and p v = o (1 /for all s < T g . 

This lemma in combination with Corollary 3.2 implies that T g < s* < T stop . Hence every stage up to T„ 
involves a swap. Then it follows from the second clause of Lemma 3.5 that T g < n (since G v is reduced 
by at least 1 at every non-bogus swap). Hence by (3.2.1) we have established (with high probability) the 
existence of a stage T g < n such that 
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Table 10: Likelihood of various properties in the initial configuration under certain conditions, when p < 0.5 and r < 0.5 


Property 

Probability 

Distribution 

Likelihood 

Stable a-interval 

Pstab 

Stable ~ B(W, 1 - p) 

high if 2 t + p < 1 , low if 2 t + p > 1 

Unhappy a-node 

^ unhap 

Zunhap ~ B(2w,p) 

always rare 


Un 


< wC + G + -—- < o (n) 4— -J— + o ( n ) < 


1 — T 


4 w 


nrp t 

w 


Hence by (2.1.1) we have Tmix ^ T g , which means that by stage T g a safe state has been reached. Then by 
Corollary 2.4 the process will reach complete segregation, with probability 1 - o (1). 

Corollary 3.6 (Safe state arrival). Suppose that T+p < 1. Then with high probability the process («, w, r,p) 
reaches a safe state by stage n, and eventually complete segregation. 

This argument (with the full details given in Section 5) concludes the proof of Theorem 1.2 for the case 
r > 0.5. It remains to deal with the case r < 0.5. 


4 The case when intolerance is at most 50% 

In this case the behaviour of the process ( n,w,r,p ) is very different, since the mixing index is strictly 
decreasing. This means that the process is bound to arrive to a dormant state, with absolute certainty. Note 
that if r < 0.5 then complete segregation is a dormant state, but it can be shown that the final state is never 
complete segregation. We show that in most typical cases for p, the outcome is static when r < 0.5. We 
assume that p < 0.5 because the case p = 0.5 has already been analysed in [BIKK12, BELP14] and the 
case p > 0.5 is symmetric. Hence on the hypothesis r < 0.5 we have p + r < 1 and by Table 4 the unhappy 
a-nodes are an arbitrarily small proportion of the a-nodes as w —* oo. In any case, since p < 0.5 < 1 - p 
we have r - p < 1 - r - p, so the probability that an a-node is unhappy is much smaller than the probability 
that a /i-nodc is unhappy. However what matters in the analysis for r < 0.5 is the relationship between the 
likelihood of stable intervals and unhappy a-nodes. This analysis is a reminiscent of the work in [BELP14], 
but has some new features. 

Definition 4.1 (Stable intervals). A stable interval is an internal of nodes of length w which contains at 
least (2 w + l)r nodes of one or the other type. An interval is a-stable if it contains at least (2 w + l)r nodes 
of type a. 

The /3-stablc intervals are defined analogously. Note that no a-node which is inside an a-stable interval can 
swap during the process. The reason is that such a-nodes are happy just because of the presence of the other 
a-nodes in the same interval. Then a simple induction shows that they will continue to be happy throughout 
the process, thereby remaining immune to swaps and fixed in their initial positions. A similar observation 
applies to /1-stable intervals. The existence of stable intervals is characteristic to the case r < 0.5. 

The events we are interested are the occurrences of a-stable intervals and unhappy a-nodes. The probabil¬ 
ities P s tab) Punhap of these two rare events can be viewed as tails of certain binomial distributions. Consider 
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Figure 11: The formation and dynamics of the infected area when r + p < 1. 


the variables, probabilities and distributions of Table 10. It is not hard to see that 

P s tab = P[Z sta b > (2w + l)r] and P unhap > P[Z unhap > 2w(l - r)]. 

We are interested in the event where the ratio P un hap/Pstab becomes small, because of the following fact. 

Lemma 4.2 (Static processes). Suppose that t, p are such that P un ha P = 0(c -H ’ • Pstab) for some c > 1. Then 
with high probability the process ( n,w,T,p ) is static, and in fact there exists some c* > 1 such that with 
high probability the process stops after at most n ■ c~ w many steps. 

The intuition here is that, if the unhappy a-nodes are much more rare than the stable a-stable intervals (i.e. 
if P U nhap = o (Pstab)) then it is very likely that unhappy a-nodes are enclosed in small intervals which are 
guarded by a-stable intervlas. This means that the familiar cascades that can be caused by the eviction of an 
unhappy a-node are bound to be contained in small areas of nodes. The very definition of stable intervals 
ensures that such cascades cannot pass through them. Hence the condition P un hap = o (P s tab) guarantees that 
any a-to-fi swaps are contained in small areas of nodes of total size o (n). Due to the monotonicity of the 
mixing index, this means that there can only be at most o (n ) swaps in this case. 

The second item in Figure 2 shows the probabilities P s tab, Punhap (for w = 100) with respect to r,p. We see 
that for points away from (0.5,0.5), the surface P un hap is above P s t a b, and there is a threshold curve beyond 
which the opposite relationship is established. Using basic results about the tail of the binomial distribution, 
and Stirling’s approximation we can derive the following sufficient condition for P un h a p = o(P sta b): 

1 / (1 - r) 1_T \ 2 

g(r,p) > 0, where g(r,p) = - ■ 1 (Q 5 _ r) o. 5 - r j “P- (4-0.2) 

The third item of Figure 2 is a representation of g(r,p) in the space, up to where it becomes negative, at 
which point we project it on the plane. The values of r,p that we are interested correspond to points on 
the plane, outside the collapsed area. This boundary (a curve) is more clear in the first item of Figure 2 
which is the projection of the surface to the plane, with different colours indicating the points which make g 
positive or negative. This boundary can be simplified (with slight loss of generality) if we consider the line 
that passes from the two points where the boundary curve intersects the lines r = 0.5 and p = 0.5. Hence 
if 2p • (1 - 2ko) + t + K() < I, we arc in the stable region, which shows a clause of Theorem 1.2. Note that 
both of the partial derivatives of g are negative when r,p e [0,0.5). If we fix p = 0.5 then the largest value 
of r that keeps g(r,p) > 0 is the solution (i<o ~ 0.353092313) of the first equation of Table 12. Hence we 
may conclude that if r < kq and p e (0,0.5] then P un hap = O (c~ w ■ P sta b) for some c > 1. We can also look 
for the largest square that is contained in the large area of the first item of Figure 2 (where the process is 
static). The edge of this square is given in Table 12. Hence if p, r e (0, A o) then P un h ap = O (c~ w ■ P sta b) for 
some c > 1. 
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Table 12: Threshold constants of interest and their derivation equations. 



We have one last observation to make about the function g. If we let do not restrict the values of r € (0,0.5) 
then we wish to find the values of p such that g(r,p). According to the properties of g (in particular its 
negative derivative on p), these are all the positive numbers which arc less than the limit (which is also an 
infimum) 


lim - 

r—*0.5 2 


/ (l-r) 1 - 7 \ 2 
\ (0.5 - 


0.25 


Hence we may conclude that if p < 0.25 and r e (0,0.5) then P un hap = O (c w ■ P sta b) for some c > 1. This 
concludes the proof of the second clause of Theorem 1.2. 
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5 Appendix 


In this section we provide supplementary material to the main part of the paper. This includes mainly 
proofs of the claims we made towards the proof of our main theorem, but also additional introductory 
material, figures, tables and mathematical background. The structure of this supporting material follows 
the presentation of the main part of the paper. 


5.1 Schelling models 

The definition of the Schelling model in Section 1.1 is rather standard, close to the spacial proximity model 
from [Sch69, Sch71a] and identical to the model studied in [BIKK12, BELP14]. Most significantly, it is an 
unperturbed Schelling model, where agents cannot make moves that are detrimental to their welfare. We 
have already remarked in the introduction that various more realistic-looking rigorously analysed perturbed 
versions of the model in the literature (such as [Zha04a]) actually force ‘regularity’ on the process, which 
makes it fit an already existing methodology (such as Markov chains with a unique stationary distribution, 
or with properties that guarantee stochastically stable states). Even if we commit to the absence of per¬ 
turbations in the model, it is possible to add complications to the simple dynamics defined in Section 1.1. 
For example, the agents may take into account the distance they need to travel before they move. However 
it is the simplicity of the original Schelling model, contrasted by the complexity of the analysis required 
to specify its behaviour (as demonstrated in [BIKK12, BELP14]) that make this topic fundamental and 
interesting. 

Under the above requirement for simplicity and proximity to the original model, there remain a number of 
ways that the model can be altered or generalised. For example, note that in the case that r > 0.5 in the 
model of Section 1.1, two nodes may swap although the number of same-type nodes in their neighbourhoods 
remain the same after the swap. One may alternatively require that for such a swap, the corresponding 
numbers of same-type nodes in the neighbourhoods increase (note that such a modification would not make 
a difference if r < 0.5). Our choice on this issue follows Brandt, Immorlica, Kamath, and Kleinberg in 
[BIKK 12, Section 2]. One generalisation, considered in [BELP15], is to allow different tolerance thresholds 
for the two types of individuals. Another generalization, already present in [Sch69], is to introduce a 
number of vacancies, i.e. to allow the total number of individuals to be smaller than the number of sites. 
We could also alter the dynamics. Instead of switching two chosen individuals at each stage, we could 
merely choose one individual and change his type. Such an action may be interpreted as the departure of 
the individual to some external location and the arrival of an individual of the opposite type at the site that 
has just become available. Model with this dynamics are often said to have switching agents (see [BELP15], 
where such a model was analysed) as opposed to the swapping agents of the model of Section 1.1. 

It is worth pointing out that the Schelling model with switching agents is closely related to the spin-1 
models used to analyse phase transitions in physics, and in particular the Ising model. Indeed, in the Ising 
model (originally introduced in order to explain ferromagnetism in the context of temperature) a system of 
atomic nuclei interact with an auxiliary ‘heat bath’ which affects their spin. Such connections have been 
analysed by many authors (see for example [SS07, DCM08, PW01, LG09, 008]), where the dynamics is 
based on the Boltzmann distribution on the set of possible configurations. A rough analogy between the 
two models is that ‘energy’ corresponds to some measure of the mixing of types (see the definition of the 
mixing index for the Schelling model below) and ‘temperature’ corresponds to the intolerance parameter r 
(as least insofar phase transitions refer to varying values of the temperature or r). On the other hand, the 
Schelling model with closed dynamics has a counterpart in the Ising model with Kawasaki dynamics. 
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Intolerance t e [0, kq) t e Oo,0.5) t - 0.5 r e (0.5,1] 
Segregation Negligible Exponential Polynomial Complete 


Table 13: Segregation regions in the case p = 0.5. 


5.2 Objectives of the analysis of the unperturbed model and related work 

We use the notation of Section 1.1 , so that the symbol n always means the population variable of the process, 
and w always is the parameter of the process which determines the length of the neighbourhood of nodes. 
Similarly, r,p always refer to the parameters of the Schelling process. 

In Section 5.12 we show that, with probability one, the process ( n , w, r,p) either reaches complete segrega¬ 
tion or it reaches a dormant state. In the second case, we wish to determine the extent of segregation in the 
dormant state. In view of the large number of states that the process may have (most of them ‘random’) a 
question arrises as to how to classify or even talk precisely about different states that may be the outcome 
of the process. Brandt, Immorlica, Kamath, and Kleinberg noticed in [BIKK12] that, at least in the case 
t = p = 0.5 that they considered, the extent of the segregation that occurs in the final state depends crucially 
on w. In fact, they showed that the dependence on w is ‘polynomial’. We may say that a state is regarded as 
polynomial segregation if, with high probability a randomly chosen node belongs to a contiguous block of 
size that is proportional to the value of a polynomial on w. A similar definition applies to exponential seg¬ 
regation. These two notions turn out to provide a very useful language for explaining the eventual outcome 
of the Schelling process. A full characterization (extending the work of Brandt, Immorlica, Kamath, and 
Kleinberg [BIKK12]) of the asymptotic behaviour of the process (n, w, r,p) for p = 0.5 and r e [0,1] was 
provided by the authors in [BELP14] in terms of polynomial and exponential segregation, as well as static 
processes. Intuitively, a random state is non-segregated, while polynomial and exponential segregation cor¬ 
respond to highly non-random states. The characterization from [BELP14] is summarized in Table 13. It 
is rather striking that when intolerance is increased from, say, 0.4 to 0.5 the segregation is decreased. This 
phenomenon is akin to the many paradoxes that stem from the missing link between local motives of agents 
and global behaviour of a system (e.g. see Schelling’s classic monograph [Sch78], and in particular Chapter 
4 which relates to his segregation models). Even more strikingly, the authors showed in [BELP14] that the 
paradox occurs for all r e (ato, 0.5), i.e. as r approaches 0.5 the segregation (in the final state) decreases. 

This paradoxical phenomenon is also clear in many simulations of the model. Figure 14 shows typical runs 
of the processes (5 • 10 5 ,3 • 10 3 ,r, 0.5) for r e {0.485,0.49,0.495,0.5}. The final state is depicted in the 
circle, where the nodes of one type are black and the nodes of the other type are grey. We use the space 
between the centre of the ring and the ring in order to record the actual process, as it evolves in time. In 
particular, if a grey node switches its place with a black node, we put a black node (the colour of the more 
recent node) between the location of the node and the centre of the ring, at a distance from the centre which 
is proportional to the stage where the swap occurred. Hence we may observe “cascades’ of swaps of nodes 
of the same type, which are less severe as r approaches 0.5. Such cascades are crucial in the rigorous 
analysis of the model, both in [BIKK12] and in [BELP14]. Figure 14 shows that as r approaches 0.5, the 
segregation is decreased. This behaviour can be traced to the probability that a node is unhappy in the 
initial configuration, and in fact, the threshold constant kq is derived by comparing related probabilities in 
[BELP14], 

In the case p = 0.5 the two constants kq and 0.5 mark phase transitions in the limit state of the process 
(n,w, r,p), as r takes values in [0,1], This brings us to another important objective of the analysis of the 
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Figure 14: 500K population with w = 3000, p = 0.5 and r = 0.485,0.49,0.495,0.5. All made about 130K swaps. 


Schelling process, which is the discovery of phase transitions with respect to the parameters r,p. Inciden¬ 
tally, we note that the discovery of phase transitions has been one of the original motivations for the study of 
the one and two dimensional Ising model, when one varies the temperature (see the end of Section 5.1 for a 
brief discussion of the analogy between the Ising and the Schelling models). Finally we are also interested 
in the expected time that the process take to converge. 


5.3 Asymptotic notation 

We use the asymptotic notation. Given two functions /, g on the positive integers, (as is standard) we say 
that / is O ( g ) if there exists a positive constant c such that f{t) < c ■ g{t) for all t. We say that g is Q(/) if 
/ is O (g), and that g is 0(/) if both / is O (g) and / is Q(g). We also use this notation, however, in a more 
general sense: we say that / is g(0 (0) if there exists some c > 0 such that / < g(ct) for all t. For example, 
when we say that a function / is ne~ ° (t \ this means that there is c > 0 such that /(f) < ne~ ct for all t. Or, if 
we say that / is n( 1 - e~ ° (,) ), this means that there is c > 0 such that /(f) < n( I - e~ ct ) for all t. Similarly, 
we use 0 in a more general sense. We say that / is g(0(f)) to mean that there exist constants co and c\ 
such that g(co ■ f) < fit) < g(c\ ■ f) for all f. We say that f = o(g) if lim, f(t)jg(t) = 0. The (often hidden) 
variable underlying the asymptotic notation in the various expressions will be w. In other words, for fixed 
values of p and r, the choice of constants required in the asymptotic notation, will always depend only on 
w. We also combine the ‘high probability’ terminology with the asymptotic notation in a manner which is 
worth clarifying. When we say, for example, that ‘with high probability the number of initially unhappy 
a-nodes in the process ( n , w, r,p) is n ■ (1 - p) • this means that there exist constants co and c\ such 

that, with high probability, the number of initially unhappy a -nodes in the process (n, w, r,p) lies between 
n ■ (1 - p) • e~ c °' w and n ■ (1 - p) • e~ CvW . 


5.4 Overview of our analysis 

We use different methods for the cases r < 0.5 and r > 0.5. If r < 0.5, in order to derive conditions under 
which the process is static, we analyse and compare the probabilities of initially unhappy nodes and stable 
intervals. This approach was introduced by the authors in [BELP14]. If r > 0.5 we consider the two cases 
r + p < 1 and r + p > 1 and argue (using distinct arguments) that in each of them complete segregation is 
the high probability outcome. We elaborate on these arguments. 
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Figure 15: The logic of the proof that if r > 0.5, with high probability the process reaches complete segregation. Here ‘/2-block’ refers to the 
persistent /2-block of Section 3.1. 


Case t > 0.5 

This case is divided to the cases t + p > 1 and r + p < 1, and the structure of the analysis was depicted as 
a flowchart in Figure 8. Here we give a more detailed overview, which is illustrated in the more elaborate 
flowchart of Figure 15. First, we show that asymptotically (on w,n), from any state there is a series of 
transitions that leads to either a dormant state, or complete segregation. Hence, since there are only finitely 
many states, with probability one the process will reach either a dormant state or complete segregation. 
So in order to establish complete segregation as the eventual outcome, it suffices to show that the process 
maintains unhappy nodes of each colour during all stages. 

First, assume that r + p > 1. In this case we can show that, assuming that the actual proportion of /3-nodes 
is sufficiently close to p (which is very likely according to the law of large numbers), every reachable state 
is not dormant. More precisely, we show that given such numbers of a and /i-nodes, every permutation of 
them on the ring corresponds to a state which has both unhappy a and unhappy /i-nodes. Since the numbers 
of nodes of each type do not change during each transition, this argument suffices for this case. Recall that 
states with the property that no series of transitions from them leads to dormant states are called safe. So, 
in the case r + p > 1 we argue that (with high probability) the initial state is safe. 

Second, we assume that r + p < 1, which is a considerably harder case. Under this hypothesis, in the 
initial configuration we have o (n) many unhappy a-nodes and Q(n) many unhappy //-nodes. As before, 
it suffices to show that (with high probability) the process never reaches a dormant state. It is not hard to 
see that (with high probability) the initial state is not dormant. However it is no longer clear if the initial 
state is safe. We show that given the expected numbers of nodes of the two types in the initial state (or 
numbers sufficiently close to then - expectations) any permutation of the nodes on a ring corresponds to a 
state with at least one unhappy /3-node. Hence, with high probability, the process will never run-out of 
unhappy /3-nodes and we only need to argue about the preservation of unhappy a-nodes. Already it should 
be clear that this is an asymmetric case where the a-nodes (the majority) and the /i-nodes (the minority) 
play different roles. When r + p < 1 there are many permutations of the nodes (which correspond to states 
where all a-nodes are happy, i.e. dormant states. So the argument that was used in the case r + p > 1 is 
no longer relevant for arguing for the preservation of unhappy a-nodes in the process. The argument we 
use instead is based on the asymmetry between the number of unhappy /3-nodes and the unhappy a-nodes, 
which creates a dynamic that favours the preservation of unhappy a-nodes. More precisely, it favours the 
preservation of /3-blocks of length > w, which is a condition implying the existence of unhappy a-nodes 
(indeed, the a-nodes neighbouring a /3-block of length at least w are unhappy). Hence if we show that the 
expected number of unhappy a-nodes remains small during the stages of the process, then we have that we 
can expect the existence of unhappy a-nodes (and unhappy /3-nodes) up to the point where the total number 
of unhappy nodes is small. 
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In addition we show that if the total number of unhappy nodes in a state is sufficiently small, then this 
state is safe, i.e. there is no series of transitions from it to a dormant state. The argument is concluded by 
showing that it is very likely that by stage n the process will arrive at a state with appropriately low number 
of unhappy nodes, before it reaches a dormant stage. Figure 5 is a plot of the numbers of unhappy cr-nodes 
and the unhappy /3-node during the stages, taken from two typical simulations (one with large and one with 
small population), when r + p < 1. The process we described is clearly visible: the number of unhappy 
a-nodes remains small, until the number of unhappy /3-nodes becomes small. Up to the later point, as we 
explained, the dynamics favours the preservation of unhappy a-nodes. 


Case t < 0.5 

In this case we have r + p < 1, and this means that in the initial configuration the a-population is happy 
with a few exceptions, while the /3-population is unhappy, with a few exceptions. By the definition of the 
dynamics of the model a-to-/3 swaps can only occur in areas where there are unhappy a-nodes. Hence in 
this case the a- to-/3 swaps will be concentrated in a very few selected areas in the ring, at least in the first 
stages of the process. This concentration of a-to-/3 swaps creates cascades of it- node evictions which can be 
clearly seen in simulations such us the one displayed in Figure 7. 2 If we could argue that such cascades are 
restricted to small areas around the initially unhappy a-nodes, then it is not hard to argue that the process 
reaches a dormant state rather quickly, having affected only a very small number of nodes. The way we 
do this is through stable intervals, a device that was also used in [BELP14]. Roughly speaking, these are 
intervals that do not allow the spread of unhappy a-nodes through them. 

If p is very small, or if r is very small, then stable intervals occur with high probability. On the other hand, 
if p, r get sufficiently large, the probability of a stable interval tends to 0 as w —» oo. This contrasts with 
prevalence of unhappy a-nodes. When r,p are small, the probability of (the occurrence of) an unhappy a- 
node is small, while it gets large when r,p increase. Figure 16 shows the actual probabilities (as calculated 
in Section 4) as functions of r,p for the specific value of w = 100 (the shape of the plots does not change 
significantly for different values of w). The interesting case is the range for T,p where both probabilities 
tend to 0 as w —> oo, i.e. both events become rare. Somewhere on the horizontal r-p plane there is a line 
marking the intersection of the two surfaces. This is where the probability of a stable interval becomes less 
than the probability of an unhappy a-node. Moreover, as w —* oo the ratio of the two probabilities tends 
to infinity or zero, depending whether r, p sit on one side of the plain (with respect to the intersection line) 
or the other. The crux of the argument in Section 4 is that for many values of r, p stable intervals are much 
more common than unhappy a-nodes in the initial configuration. This allows us to argue that, in this case, 
the process has to reach a dormant state after o (n) many swaps. 


5.5 Properties of welfare metrics 

The social welfare V of the state can easily be seen to be non-decreasing along the transitions of the process. 
Let us establish the relationship with the mixing index. Given a certain state of the process and a node u, 
we let u a denote the number of a nodes that arc located in the neighbourhood of u at this state. Similarly, 
we let uP denote the number of /3-nodes that are located in the neighbourhood of u. Furthermore, we denote 

2 Here the current configuration is the outer circle, while the initial random state is the inner small circle. Whenever a swap 
occurs at some stage, a dot is placed at a distance from the center which is proportional to that stage, at the same angle where the 
involved node lies. The color of the dot corresponds to the type that the node changed to under the particular swap. 
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Figure 16: The probabilities of a stable interval and an unhappy tr-node, as functions of T,p < 0.5 when w = 100. 


by ( aj ) and (/?,■) and the finite sequences of a and /3 nodes respectively in the state. Hence </’■ denotes the 
number of /3-nodes that are located in the neighbourhood of aj while /?’ denotes the number of a-nodes that 
are located in the neighbourhood of / 3j . Given a state, let n a , np be the number of a and /1-nodes respectively. 
Then _ 

j<np i<n a 

In order to prove this equality, consider the state of a and /3 types in the state and start by removing all (3 
from their positions. Then, adding the /3 types one-by-one back to their original positions we can see each 
placement incurs the same increase to the two sums. Hence by induction, the two sums are equal. 

We call the number in (5.5.1) the mixing index of the state, because it can be used as a metric of how mixed 
(i.e. not segregated) the population of a and (3 types is at the given state. Indeed, suppose that the state has 
at least 2w + 1 nodes of each type. In the state of complete segregation the sums in (5.5.1) take the value 

2 • (1 h- 1 - w), which is w(w +1). This can be shown to be the minimum mixing index (in a state which has 

at least 2w + 1 nodes of each type). At the other extreme, if the two types are uniformly mixed (in the sense 
that every interval I has approximately p* • |/| green nodes) then the sums in (5.5.1) take approximately the 
value n ■ 2w ■ p*(l - p*), which can be shown to be the maximum possible mixing index. We also have 

^ ^ of = (2w + 1) ■ n a and + = (2w + ^ ' nf} ' (5.5.2) 

i<n a i<n a j<n p joip 

From (5.5.1) and (5.5.2) we get V = (2 w + 1) • n - 2 ■ mix. 

Lemma 5.1. Ifr < 0.5, each step in the process decreases the mixing index by at least 4. 

Proof. Suppose that we swap an unhappy a-node u with an unhappy /3-node v. Let N U ,N V be the neigh¬ 
bourhoods of u, v respectively and let I = N U C\ N v . Here we view the nodes as stationary, so that a swap of 
nodes means a swap of their types. The mixing index of the nodes in I will not change after the swap. Since 
r < 0.5 the number x of a-nodes in N u -I - {u} is smaller the the number y of a-nodes in N v -I - {»}. After 
the swap the mixing index of each of the a-nodes in N u -I- {u\ will increase by one while the mixing index 
of each of the /3-nodes in the same set will decrease by one. If t = 2w + 1 is the length of the neighbourhood 
and i is the number of a-nodes in / then the mixing index of u before and after the swap is t - x-i (the size 
of the neighbourhood minus the a-nodes in the neighbourhood) and x + i (the number of a-nodes in N u - I 
plus the number of a-nodes in I c N u ) respectively. Hence the difference in the sum of the mixing indices 
of the nodes in N„ - I before and after the swap is the addition of 

(a) the difference in the mixing index of u 
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(b) the difference in the sum of the mixing indices of the nodes in N u - I - [u] 

where the differences refer to the stages before and after the swap. For (a) we have (x + i) - (t - x - i). For 
(b) there is an increase (by 1) of the mixing indices of each a-node in N u -1-{it} since u becomes a /i-node. 
Moreover there is a decrease (by 1) of the mixing index of the /i-nodes (as u ceased to be an a-node). Ffence 
for (b) we have x-(t - x - i). Overall, the difference in the sum of the mixing indices of the nodes in N u -1 
before and after the swap is x - (t - x - i) - (t - x - i) + (x + i) = 4x - 2l + 3 i. A similar argument shows 
that the difference in the sum of the mixing indices of nodes in N v - I is 2t - 3 i - 4y. Flence overall (and 
since the nodes outside N u U N v maintain the same mixing index before and after the swap) the difference 
in the (total) mixing index is 4(x - y). Since x <y this means that a decrease by at least 4 occurs due to the 
swap. □ 

In our analysis, one of the basic facts used is that that dormant states have at least a reasonably high mixing 
index. If we can show that with high probability the process reaches a point where the mixing index is 
too low for dormant states to be accessible, then by Corollary 5.26 we will have shown that with high 
probability complete segregation is the eventual outcome. Proposition 5.3 below provides an appropriate 
bound for the mixing index of dormant states. First we prove a technical lemma, which will then be used 
in the proof of Proposition 5.3. 

Lemma 5.2. Suppose that r > 0.5, p* < r, and 0 <s w <s n. In a dormant state of the process ( n,w,T,p ) 
every /3-block has length at most 2f(1 — t)w \ and every /3-node is ["(I - r)w~\-near to an a-node. 

Proof. Since the second claim implies the first, it suffices to prove the second claim. By Lemma 5.20 we 
can assume that there are unhappy /3-nodes in the given state. For a contradiction, suppose that some /i-node 
is not f( 1 — r)H’"|-near to any a-node. Consider the a-node which is adjacent to the block and to the right of 
it. For large w, 2f( 1 — r)wj + 1 < w, meaning that this a-node has at least 2f(1 - r)wj + 1 nodes of type f3 in 
its neighbourhood. Hence the a-node has at most 2w - 2L(1 - r)wj nodes of type a in its neighbourhood, 
which is less than (2w + l)r. The fact that this a node is unhappy means that the state is not dormant. □ 

Proposition 5.3 (Mixing in dormant states). Suppose that t > 0.5, p* < r, and 0 <sc w <sc n. The mixing 
index in a dormant state of the process (n, w, T,p) is more than n(w + I )rp*. 

Proof. Suppose that in a dormant state the mixing index is at most n{w+ l)rp*. Since there are np t nodes of 
type (3. there exists such a node u with mixing index at most (w + l)r. By Lemma 5.2 there exists an a-node 
v within ((1 - r)wj nodes to the left or to the right of u. The number of a-nodes in the neighbourhood of v 
is therefore at most (w+l)r + r(l- t)w~\. However this same number must be at least (2 w + l)r since v is 
happy in a dormant state. This holding for arbitrarily large w would imply that (1 - t) > r which gives the 
required contradiction. □ 

We do not know if the bound provided by Proposition 5.3 is tight. However it is sufficient for the proof of 
Theorem 1.2, which only requires a bound that is proportional to the population size n. 


5.6 Number of unhappy nodes and maximal blocks 

While a low mixing index suffices to establish the inaccessibility of dormant states, in fact it will often 
be more convenient to work directly with the number of unhappy nodes. The aim of this subsection is to 
allow us to do this, by establishing a fairly tight relationship between the number of unhappy nodes and the 
mixing index. 
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As another measure of mixing, we may consider the number kg of maximal contiguous /3-blocks in the 
state. Let /3, be the ith node of type /3 and let /3'. f denote the number of a-nodes in the neighbourhood around 
. Let [x,y] be a finite interval of integers such that j/3, : i e [x,y]\ constitutes a block (i.e. there is no 

a-node between/3 X and/3 V ). If x-y > w then j3® H- i-j3" is bounded above by 2 • (1 h -h w) = w(w +1). 

If x - y < w the number w(w +1) continues to be a bound for/3“ + • • • + /3". Therefore 

^ < w(w + 1) • k^, where k^ is the number of maximal /3-blocks. (5.6.1) 

i<np 


This inequality is a formal expression of the rather obvious fact that the fewer maximal /3-blocks there are, 
the less mixed the two types arc. By the definition of happy nodes, if r > 0.5 and w > (1 - r)/(2r - 1) then 
no two adjacent nodes of different types can both be happy. This means that, as we move around the circle 
of nodes, every time we cross the border between a maximal /3-block and a maximal cr-block we may count 
an additional unhappy node. So, provided that r > 0.5 and w is sufficiently large, the number of maximal 
/3-blocks is bounded above by the number of unhappy nodes in the state. Then by (5.6.1) we get 


mix < w ■ (w + 1) • kg < w ■ (w + 1) • U 


Intuitively this inequality says that the only way to have a small number of unhappy nodes is a small mixing 
index, i.e. a large degree of segregation. On the other hand we may bound the number of unhappy nodes in 
terms of the mixing index. By (5.5.1) and the definition of unhappy nodes 


U a ■ (1 - t)(2w + 1) < mix and U/j • (1 - r)(2w + 1) < mix 


where U«, U/j are the numbers of unhappy nodes of type a and /j respectively. So 


MIX < W • (w + 1) • k^g < W ■ (w + 1) ■ U < MIX • 


2w(l + 1 /w) 

(1 - r )(2 + 1 / w ) 


2 w 

< MIX • - 

1 - r 


and 

1 mix 2 MIX 

— •-< kg < U <-•- 

W W + 1 1 - T W + 1 

which means that if r > 0.5 (and w is sufficiently large) then U = 0(k^) = 0(mix). 


5.7 Background on probability 

We make use of the various concentration of measure inequalities for random variables and (super)martingales. 
The simplest of these is Markov’s inequality, which says that if X is a non-negative random variable with 
E(X) = /j and a > 0 then P(X > a/i) < 1 /a. Recall Hoeffding’s inequality for independent Bernoulli trials. 

Lemma 5.4 (Tight Hoeffding for Bernoulli variables). Let Z, be independent Bernoulli trials with expected 
value p, and let Sk = 'LkL Z i■ Then P|.S’/ ; < k(p - e)| < e~ 2( '~ k and P[S/_• > k(p + c)| < e~ 2e ~ k for each e > 0. 

If p < 1/2 then P[S / { > k(p + e)] > 1 /4 • e~ 2e k ^ p for each e > 0 such that e < 1-2 p. 

The second clause of this lemma (the tightness of the inequality) follows from Slud’s inequality [Slu77] 
(which gives a lower bound of the binomial upper tail in terms of the upper tail of the normal distribution) 
and standard lower bounds for upper tail of the normal distribution (see [Mould] for more details). 


25 







Since there are complex dependences amongst the random variables of the Schelling process, we often need 
to ‘approximate’ certain processes with canonical processes like simple random walks. Here a random walk 
with respect to the integer-valued random variables (Z,j is the stochastic process R/ c - r + fji<k Z/o for some 
r € N. We say that (Rf) is mined at step k if k is the least number such that Rk < 0. The following simple 
fact is obtained via a standard coupling argument. 

Lemma 5.5 (Random walk simulation). Let to, t] e N, X, e {-? 0 ,0, } be (possibly dependent) random 
variables, let X, € {—to, 0, t\ | be independent Bernoulli trials and let Y\ = Y,i<k -W> Yk — Tii<k^k be the 
associated random walks. Provided that, no matter what occurs at stages prior to i, at stage i we have 
P[X,- = -to] < P[Xi = -to] and P[X,- = ti] > P[X) = t\], then for all k,x 6 N the probability that (F,- + x ) is 
ruined by step k is bounded above by the probability that (F, + x) is ruined by step k. 

The following fact about biased random walks is folklore. 

Lemma 5.6 (Biased random walks). Let to,t\,r e N, and let X,- e {-to,0, t\ { be (possibly dependent) 
random variables such that at stage i, no matter what has occurred at previous stages, P[X,- = t\ | X,- ± 0] > 
W(to + t]) + 6 for some 6 > 0. Let Yj = r + Y^i<j X„ be the associated random walk. Then the probability 
that (Yj) is ever ruined is bounded above by c _2r ‘ 5 "^°/(l - e 2l) ). 


Proof. Let Z, e {-to, fi} be independent variables such that P[Z, = -to] = t| /(to+h)—6. Let G, = r+X /<; Z ; 
be the associated random walk. Then P[F,- = -to] < P[Z, = -to], so by Lemma 5.5 it suffices to show that 
the probability that (Gf) is ruined is bounded above by e ~ 2rS ^ U] /(I - e 2,r ). 


We may view Z, as independent Bernoulli trials, where Z s = t\ is viewed as success and Z s - -to is viewed 
as failure. Let p = P[Z, = fi], so p = to/(to + ft) + d. If k s is the number of successes up to step s, then 
G s ~ r + t\k s - (s - k s )to so ruin of the random walk Gj at step s implies that k s < (t 0 s - r)/(t 0 + 1 1 ). We 
may use Lemma 5.4 in order to bound the probability of this event. If we let 6 S = p - (to - r/s)(to + t\) 
note that 6 S > p - to/(to + ti) = 6, so by Lemma 5.4, e~ s is an upper bound for the probability that (Yj) 
is ruined at step 5. Next, note that (Gj) can only be ruined at stages > r/to. Hence 



se[r/tQ,m) 


e -2rS 2 /to 
I - e - 2( > 2 


is an upper bound of the probability that (Gj) is ever ruined (before stage m), which concludes the proof. □ 


Our analysis depends on various exponential bounds that we can obtained on the expectations of certain 
parameters (e.g. the number of unhappy cr-nodcs). The following fact will be routinely used in order to 
express such bounds in a canonical form. In the following statement the variables Z s concern stage ,s of the 
Schelling process (n, w, r,p) and the constants q, q', p are independent of n, w. 

Lemma 5.7 (Expectation bounds). Let f be a polynomial, p < 1 and Z s a random variables such that 
E(Zi) < npfor all s. IfE(Z s ) < n ■ f(w) ■ e~ wq for some q > 0 and all all s and all sufficiently large w then 
there exists q' > 0 such that B(Z V ) < n ■ e~ wq ' for all w, 5. 


Proof. Since / is a polynomial, we can choose qo > 0 and w o such that nf(w)e~ wq < ne~ wqo for all w > wo- 
Hence E(Z. V ) < n ■ e~ wq ° for all s and all w > wq- We may choose q’ < qo such that p < e~ wq for all w < wo- 
Then by the assumption on p we have that E(Z V ) < ne~ wq for all w and all s. □ 


The binomial distribution with t trials and success probability p is denoted by Bit, p), and Z ~ Bit, p) means 
that random variable Z follows this distribution. Stirling’s formula asserts that n\ ~ n n+ ^e~ n , i.e. that the 
limit of the ratio of the two expressions tends to 1 as n tends to infinity. 
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Lemma 5.8 (Stirling’s approximation). There exists a polynomial y i—> p(y) such that for all k £ N and all 
x £ R, n (0, k) 

( 1 \ ( k\ ( k 

there exists q 6 —— , p(k) I such that = q- 

\p(k) ) \M/ VL-^J 

Proof. Let z = x or z = k - x. Also let z' -\z \ or z! = LzJ- Then according to the definition of the binomial 
coefficient it suffices to show that there exists a polynomial y i-» r(y) such that 

z'l = q ■ Z for some q £ ( —, r(k) 

\r(k) 

Note that there exists 6 £ (-1,1) such that z! = z + 6. Then 

(z + 6) z+s+ ? =z z -(z + S) s+ * ■ (l + 5 - j . 

The second term on the right side of the equation is bounded by a polynomial in k while the third term is in 
(e~ l , e). Hence there is a quadratic polynomial y i-» r(y) such that 

(z + S ) z+6+ 2 e (z z ■ r(k)~\ r ■ r(k)) . 

By Stirling’s approximation it follows that there exists a quadratic polynomial y i-» p(y) such that for all k, 
x < k and z„z! as defined above there exists q e (1 fp(k),p(k)) such that zl = q ■ z'l. This fact, along with 
the definition of the binomial coefficient, implies the required statement. □ 

In our analysis of the Schelling process for the case when r < 0.5 we will need to compare the tails of 
different binomial distributions. There are a number of ways for doing this (including using approximations 
with the normal distribution) but the simplest is the following elementary fact from [Bolon, Theorem 1.1]. 

Lemma 5.9 (Tails of the binomial distribution). Suppose that Ay ~ B(N,p), p,k e (0,1) and for all 
sufficiently large N, (1 + k( I — p)/p) ■ h(N) > N > h(N) > p ■ N > 0, where h : N —> N. Then 

F\X N = h(N)\ < P\X N >h(N)\ < ■ P[A W = fi(A)] 

for all sufficiently large N. In asymptotic notation we have P[A;y > h{N)] = 0 (P[Ajy = h{N)]). 

The combination of this result with Stirling’s approximation of the binomial coefficients gives the required 
information about the asymptotic behaviour of the ratio of the two binomial probabilities of interest (un¬ 
happy nodes and stable intervals). 

5.8 Martingales in the Schelling process 

A crucial part of our analysis is based on two supermartingales, one regarding the non-anomalous a-nodes 
in the infected area, and one regarding the anomalous nodes. The latter is somewhat sophisticated, in the 
sense that it is not adapted to the stages of the process. Nevertheless it is a supermartingale relative to a 
more general process, and this is sufficient for our analysis. Due to this sophistication, we clarify how we 
regard the process (n, w, r,p) in probabilistic terms, and what we mean by a martingale. 

The states of the system are all configurations of n nodes that can have one or the other type. A state B is 
accessible from another state A (thought as an arrow from A to B) if an application of a legitimate swap on 
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A gives B. We view the random process as a combination of two parts. The first is the production of the 
initial state according to the given probability distribution of the two types. The second is the stochastic 
process that starts from the initial state and moves to the next state, choosing uniformly randomly from 
all the (finitely many) currently accessible states. We denote the initial state by Fq and the state at stage 
s by F s . The remaining discussion refers to the second part of the process, where F o is a constant. The 
underlying probability space Q is the set of all infinite sequences of states, which start with Fq and have 
the property that each term is a state which is accessible from its predecessor. We also add into Q the finite 
sequences of states, which start with Fq, each of their terms is accessible from its predecessor, and its last 
term is an absorbing state. We view this as a tree, where the /th level of the tree (prefixes of points in the 
space of length /) describes all possible outcomes of the process up to stage i. This tree has dead-ends, 
namely the absorbing states. The probability measure on Q is the uniform one, namely the one induced 
by splitting the total measure 1 uniformly inductively starting from the route and considering all accessible 
paths. Then each F, can be viewed as random variable on ft, which takes any point in the space and outputs 
its /th term. 

A number of other processes will be defined, relative to the process (F,) which contains all the information. 
Clearly (F,) is memoryless (has the Markov property) since the distribution of F ; - only depends on the value 
of F ; -_i. The secondary processes that we consider in our analysis (like Z s or G j can be seen as recording 
only part of the information of the full process Fo,..., F s up to stage s. In general, a process X s is adapted 
to (or defined in terms of) another process J s if there is a function such that /(Fs) = X s for every point in 
O. Recall that & filtration on Q is an increasing sequence of cr- algebras on Q. The reader who is used to 
working with filtrations (especially with respect to martingales) can equivalently view a process X s adapted 
to another process J s as X s adapted to the natural filtration (ffs) of (Fs): this is the filtration generated by the 
inverse images of the Borel sets of Q, with respect to the variables J s . For example, the natural filtration of 
the full process ( F s ) is (F s ) where F s is the rr-algchra generated from the maximal branches of Q restricted 
to strings of length s or less. Intuitively F s can measure all events that can possibly happen up to stage s. 

In order to show that a certain process is a martingale, we will have to adapt to another suitable process. 
Equivalently, we would have to adapt it to a suitable filtration (which may be different than the standard 
filtration FF S ) that we described above). This is the reason for introducing adapted processes: the simplest 
martingale notion corresponds to processes adapted to themselves, and is not sufficient for our proof. Recall 
that a process H s is a supermartingale relative to a Markov process J s if it is adapted to it and E [// s+ i | J s ] < 
H s for all s. This means that relative to the set of reals in Q which have the particular value of J s (which is 
regarded as fixed) the expectation of H s +\ is bounded by H s (which is a function of J s ). This is the standard 
definition of conditional expectation in terms of processes. In our analysis we occasionally need to consider 
E | ./] conditional on a set of reals A c Q. We denote this by B A [ H s+ \ | Fs]- A stopping time with 
respect to a process (/()) is a random variable T such that the truth of the event T = k (for any integer k) 
is a function of F, / < k. If F is a stopping time for (Fs) and (H s ) is a supermartingale with respect to (Fs), 
then the stopped process H sh j (which proceeds as H s up to stage F, and then it is constantly equal to Hj) 
is also a supermartingale (with respect to (Fs))- Doob’s maximal inequality for supermartingales says that 
if ( H s ) is a non-negative supermartingale with respect to another process (J s ) and E [ Hq ] = p, a > 0 then 
P[sup s H s > ap] < 1 /a. 


5.9 Probability in Schelling segregation 

In this section we lay out a general way for arguing about the probability of the various properties that a 
node can have in the initial configuration. 
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Definition 5.10 (Rare and common events in the initial configuration). A property of a node in the initial 
configuration is called rare (or a rare event) if it holds with probability at most n ■ e~ Sw , for some positive 
constant 8 which may depend on T.p but not on w, n. A property whose negation is rare is called common. 

Definition 5.11 (Local properties). A local property P lt of a node u in the initial configuration is one that 
only depends on the nodes that are at most f (w)-far from u, where f is a fixed function. In other words the 
property is local if given any two nodes u, v such that for all i € [-/(w),/(w)], u + i is of the same type as 
v + i, then P u holds iff P v holds. In this case we say that P u is f-local. 

Note that the two probabilities mentioned in Lemma 5.12 are on different spaces. The first one refers to 
the product space where a point is an infinite series of initial states. The second one refers to the space of 
points on a random initial state. 

Lemma 5.12 (Strong law of large numbers for the Schelling process). Given a local property P u of nodes 
in the initial state of the process ( n,w , T,p), with probability one, as n —» oo the proportion of nodes u that 
satisfy P u tends to the probability of P u . 

Proof. Let p be the probability of P u and let / be the function indicating the area around u on which P„ 
depends (as in Definition 5.11). We wish to use the strong law of large numbers, so we need to manufacture 
a series of independent trials of properties with given expectation. Let m e N be a parameter that depends 
on n (to be specified shortly). We consider the ring as a union of intervals of length mf(w) + 2 f(w) (which 
we think of an interval of length mf(w) with padding /(w) nodes on each side). We always assume that 
mf(w) + 2 f(w) < n. Starting from node 0, denote the fth such interval by V; so that | V,f = mf(w) + 2/(w). 
Also, denote the subinterval of V, that results from deleting the /(w)-node prefix and the /(w)-node suffix 
of Vi by f. ffence |/,| = mf(w). Let M n e N be the largest integer such that M n (mf(w) + 2 f(w)) < n, so that 
M n —> oo as n —» oo and n - M n (mf(w) + 2/(w)) < mf(w) + 2 f(w). ffence for each / < M n , the intervals V, 
are defined and are disjoint. The same is true for i < M n . Moreover, if S is the set of all nodes, 

2 f(w)M n <\S- U i<Mn Ii\ < 2f(w)M n + mf(w) + 2f(w). (5.9.1) 

For each i < M n let Y) be the number of nodes u e /,• such that P„ holds, and note that these random 
variables are independent. Moreover, by linearity of expectation, B(K,) = pmf(w). Recall that M n —» oo as 
n —> oo. According to the strong law of large numbers, 


ji<M„ 1 1 


pmf(w) as n —» oo, with probability 1. 


(5.9.2) 


By (5.9.1), the required proportion is 

Y,i<M„ Yi + ff ( w )' (2M n + m + 2) 
(M n + 8)f(w)(m + 2) 


+ ff(w) • (2 + ^) 




(! + w)f (w )( m + 2 ) 


where 8,f range in [0,1) (depending on how close n is to being a multiple of f(m)(m + 2)). If we take 
0 <sc m «: n, the ratio m/M„ tends to 0, so by (5.9.2) the required proportion tends to 


pmf(w ) + 2 f(w)if pm + 2f P 




f(w)(m + 2) 


m + 2 


1 + 


Since 0 <sc m, the required proportion tends to p. More formally, we may let m - log n. In this case, as 
n —> oo we have m/M n —> 0 because (log «) 2 /n tends to 0. Moreover M n —» oo and m —» oo when n —> oo 
so the previous argument applies as indicated. □ 
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The following fact concerns pairs of properties P and Q that a node can have, which may both be rare but 
one (say P ) occurs with much higher probability than the other. It asserts that in this case, a random node u 
is much more likely to be nearer to a node v satisfying P than a node t satisfying Q (although it may be far 
from any node satisfying P or Q ). In the statement and proof of this result we use P u as a Boolean random 
variable which asserts that ‘u satisfies P’ (and similar with Q u ). 

Lemma 5.13 (Rare properties in the Schelling ring). Let P,„ Q u be ('-local properties of nodes in the initial 
state (where € = ( w is a function ofw) and for each node u let x u be the first node v to the right ofu such that 
either P v or Q v holds. If p, A are the probabilities of P u , Q u respectively, the probability that P Xu and there 
is no node v with Q v to the left of and at distance at most C from x u tends to a number > p/(p + A(2( + 1)) 
as n —> oo. An analogous result holds when ‘right ’ is replaced by ‘left ’. 

Proof. Consider a partition of the ring into disjoint neighbourhoods, starting from a node no as follows. 
Recall that addition of nodes is always modulo n. Given n = no, suppose inductively that n, has been 
defined. Then define u t+ \ = x Ut + 2( + I. This iteration continues as long as u t+ 1 < n. Let k n be the 
number of iterations in this recursive definition (i.e. the number of terms of the sequence (n f )). Consider 
the property 

T u : P Xu holds and no node v to the left of x u and at distance at most ( satisfies Q v . 

The sequence (n,) can be seen as independent trials for this property. Let n n be the proportion of the terms 
of (u() that satisfy of T Ui in a random initial state. Note that k„ —> oo as n —» oo with probability 1. If 7r is 
the probability of T u , by the strong law of large numbers we have that n n —» n as n —» oo with probability 
1. Let p n be the proportion of nodes that satisfy P u and let A„ be the proportion of nodes that satisfy Q tl . 

Note that we view n n ,p n , A n as random variables that depend on the initial state. Then 

Pn ( 2C + l)n n k n (2( + l)7r„ p n 

— < -=-=> n n > - 

A/j (1 — rr,i)kn 1 — n n p n + A n (2( +1) 

By Lemma 5.12 we have p„ —> p and A n —* A as n —> oo, which gives the required assymptotic bound. □ 

5.10 Initial expectations 

An important part of our analysis relies on the values of the welfare metrics at the initial state. With high 
probability, these will be near to then - expected values, which we may compute. We start with the mixing 
index. 

Lemma 5.14. The expectation of the mixing index in the initial state of(n, w, r,p ) is 2nwp{\ - p). 

Proof. Consider the random variables and note that E [/j" ] = 2vv( I - p) for each i. If tip is the number 
of /Tnodcs, the expectation of the mixing index in the initial state is np ■ 2w(l - p) by the linearity of 
expectation. If we see np as a random variable, its expected value is np. By the rule of iterated expectation, 
the expected value of the mixing index is 2nwp{\ - p). □ 

Note that the expected value of the mixing index in the initial state is only slightly smaller than the max¬ 
imum possible mixing index n ■ (2 w + 1) • p*( I - p*). This is hardly surprising, as a random state will 
be almost perfectly mixed, with the occasional non-uniformities that are implied by randomness (e.g. the 
existence of contiguous blocks of certain sizes). 

Next, we are interested in the expected number of unhappy nodes of each type. It is not hard to see that this 
depends on whether r + p<lorr + p>l (we will not consider the special case where r + p = 1). 
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Lemma 5.15 (Unhappy a-nodes). Given p, r such that p + r < 1, with high probability the number of 
initially unhappy a-nodes in the process ( n,w , r,p) is n ■ e~ &(w \ 

Proof. Let Xj be 1 if the jth node uj in the initial state is of type a and unhappy, and 0 otherwise. By 
Lemma 5.12, it suffices to show that E [A ; ] is e~ e<w> . Recall that the nodes are labelled independently, 
following a Bernoulli distribution, with the probability of a /3-label being p. Let e = 1 - p - t which is 
positive, according to our hypothesis. If uj is an unhappy a-node, then the proportion of //-nodes in its 
neighbourhood N(uf) is larger than 1 - r. Hence the proportion of //-nodes in /V(«,(./)) - {»,(./)! is larger 
than 1 - r, so it is at least p + e. 

Let A be the event that uj is an a-node and B the event that uj is unhappy, so that P[A] = 1 - p and 
P[A n B] = P[B | A] ■ P[A]. If we see the labels of the nodes in N(uj ) - { uj) as a series of 2w independent 
Bernoulli trials, by Hoeffding’s inequality for Bernoulli trials the probability that the proportion of //-nodes 
is at least p + e is bounded by e~ 4w( ~. Hence by the above discussion, P| B \ A] < e 4we . We may conclude 
that P[Xj = 1] < (1 - p) ■ e~ 4we \ Hence E [XjJ < (1 - p) • e ~^'A-r-pY _ Similarly, by Lemma 5.4 we have 

E [x ; J > (1 - p) ■ g- 4H ti-T-p)Ep/ 4 . Hence E [Xj| is n ■ e~ e ^ w \ which concludes the proof. □ 

A similar argument gives an analogous result for the unhappy //-nodes. 

Lemma 5.16 (Unhappy //-nodes). Given t, p such that p < r, with high probability the number of initially 
happy / 3-nodes in the process ( n,w,r,p ) is n ■ e~°^ w \ If in addition r + p < 1, with high probability this 
number is n ■ e~ Q(yv \ 

Proof. Let Yj be 1 if Uj is of type [3 and happy, and 0 otherwise. Then provided that p < r, by Hoeffding’s 
inequality for Bernoulli variables we have that E[X/j < p • e ~ 4w( - T ~pf _ Then Lemma 5.12 gives the first 
clause of the claim. Now lets assume that we also have r + p < 1. Then by the second clause of Lemma 5.4 
we get E [XjJ > p • e ~ 4w<J ~ p ^~. This application is possible with p = p and e = r - p because p < 0.5 and 
t + p < 1, which means that e < 1 - 2 p. Then by Lemma 5.12 we get the second clause of the claim. □ 

By a similar argument we get a bound on the total size of the incubators. 

Lemma 5.17 (Number of incubators). If r + p < 1, the probability that a node belongs to an incubator is 
g -®(w) j-j ence vv'/V/j high probability the number of incubators as well as the number of nodes belonging to 
incubators of the process (n, w, T,p) is ne~ &(w h 

Proof. Let e* = (1 - r -p)/2, and let Xj be the index variable of the event that the left semi-neighbourhood 
of the jth node has less than (r + e*)w many a-nodes. Given that r + p < 1, by Hoeffding’s inequality for 
Bernoulli variables (Lemma 5.4) and the tightness of it (Lemma 5.4), the probability that Xj = I is e~ <4<w> . 
Let Yj be the index variable of the event that the jth node belongs to an incubator, so that the probability 
that Yj = 1 is e~ 0 G) (since (2w + l)e^ 0 N4 is e~ 0(w l). Hence E (Yj) is e -0 - 1-0 . Then by Lemma 5.12 with high 
probability the number of nodes belonging to incubators of the process ( n , w, r,p) is ne~ &< - w K □ 


5.11 Accessibility of dormant states 

It is crucial to understand the dormant states and assess their accessibility from an initial state. IWe demon¬ 
strate that this issue ultimately depends on the given parameters r,p. We show that if r + p > 1 then with 
high probability we may assert that no dormant state is accessible from the initial state. On the other hand, 
if r + p < 1 then with high probability there are permutations of the initial state which are dormant. 
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Proposition 5.18 (Existence of dormant states). If t + p t < 1 then there are permutations of the initial state 
which are dormant (provided that n > 2w + 1/(1 - r - />*)). 

Proof. Consider the state where the /i-nodes occur in blocks of length L(2 w + l)p*J, which are divided by 
blocks of a-nodes of length at least \(2w + 1)(1 -p*)"|. Since [(2 w + 1)(1 -p*)l = (2 w + 1) - L(2 w + l)p*J 
and n > 2w + 1/(1 - r - p*) we can consider an arrangement such that all blocks of a-nodes have length 
exactly |"(2w + 1)(1 - p*)l, except perhaps one which may have longer length. In this state all a-nodes are 
happy and all /j-nodes arc unhappy. In particular, it is a dormant state. □ 

Lemma 5.19 (Existence of unhappy nodes). Suppose y £ { a,/3] and let 6 * be the proportion ofy-nodes in 
a state of the process (n, w, r,p). Ifr > 0.5 and 9* < t, then for 0 <sc vv <sc n there exist unhappy y-nodes in 
the state. 

Proof. Given the parameters 6*,t,w which is large, and any state of the process (n. vv, r,p) with no unhappy 
y-nodes, it suffices to produce an upper bound on n (which does not depend on the particular state but only 
#*, r, vv and the fact that no y-nodes are unhappy). 

Let 5 e (cr,j3j - |y|. Since r > 0.5 and all y-nodes are happy, there are no d-blocks of length > vv. We may 
assume that n > 3vv+1. Define the bias B(7) of an interval I of nodes to be the difference between the number 
of y-nodes in the interval and the number of d-nodes in the interval. Without loss of generality suppose that 
the node occupying site vv is a y-node (otherwise consider a rotation). We define a sequence (u,) of y-nodes 
in the state, starting with uo = w. Let Nj denote the neighbourhood of u,. Given w,-, define n, + i to be the 
rightmost y-node in Nj. Since there are no d-blocks of length > vv, the sequence (w,) is well defined and 
it never happens that u/ = w +Let m be the largest number such that none of the neighbourhoods /V, for 
0 < i < m contain the node at site 0. Since n > 3vv+ I wc have m > 0. Let I m = U'" () /V, and V m = Yi'jLo 
Note that contains all of the nodes except at most vv. Moreover since n !+ i - iij < vv we have 

\I m \ < 2vv + 1 + ntw. (5.11.1) 

Let Li, and R , be the leftmost and rightmost vv-many nodes in /V, respectively. Since /V/ contains at least 
t( 2 vv + 1 ) nodes of type y: 

B (Ni) > (2 w + 1)(2 t - 1) and V m > (m + l)(2w + l)(2r - 1). (5.11.2) 

Note, however, that some nodes have been counted multiple times in the sum that defines V m , since the 
intervals Nj are not disjoint. Lor each k £ N let consist of the nodes in I m which belong to exactly k 
distinct intervals Nj. 

By the definition of (w,), the node n !+ 2 is always outside Nj. Similarly, m ,-+4 is always outside Nj+ 2 . This 
means that it is not possible for the neighbourhoods of 5 consecutive terms of (n,) to have a nonempty 
intersection. This, in turn, implies that JT — 0 for each k > 4. A similar consideration shows that J'f 
consists entirely of d-nodes (hence B(/"') < 0). Next, note that c Lq U R m , so |./'"| < 2vv. Hence by 
counting the multiplicities of the nodes in the sum which defines V m , we have 

V m = 2B(/ m ) - B(Jf) + B(/"’) + 2B (J'f) and V m < 2B (I m ) + 2 w + B(/'{'). (5.11.3) 

Let N. = W-i n /V ,+1 and note that N'. = R ,. 1 n L,-+i. Moreover let L’. = N'. n L,- and A'' = /V' n /?,. By 
the definition of (m) it follows that if R' j is nonempty, then it consists entirely of d-nodes. Since Uj £ J'f for 
each i £ [ I, m - 1], N- = L\ U R\ U {«,-} and J'f c Uie[i,m-i] we have: 

m— 1 

B(/«) < m + Yj\L'j\ ~ l*;i)- (5-11-4) 

i= 1 
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Let dj = iii ~ u i- 1 - Then |A''| = w - dj and \L' k \ = w - di+\. Hence |Z/| = \R' n } | and 


m -1 

^(|T-| - |/?-|) < \L' m _^ - |/?il < w. 

i= 1 

Then from (5.11.4) we get B(/”') < m + w. From the second clause of (5.11.2) and (5.11.3) we have 

2B(/ m ) > (,m + 1)(2 w + l)(2r - 1) - 3w - m. (5.11.5) 

If x m ,y m are the numbers of y and 6 nodes in respectively, then x m +y m = \I m \ and x m -y m = B(/,„). Hence 
2x m = \I m \ + B(/ m ). By hypothesis we have x m < n6*. Moreover, since n < \I m \+w we have x m < (|/ m |+w)0*. 
Hence B(/ m ) < (2d* - \)\I m \ + 2wd*, so by (5.11.1), 

B(/,„) < n?w(2d* - 1) + 2w(3d* - 1) + 2d* - 1. 


By (5.11.5) we may deduce that 

2m ■ \2 w(t - d*) - (1 - t)] < w(12d* - 4r + 1) + 4d* - 2r - 1. (5.11.6) 


We may assume that w is larger than (1 - t)/[2(t - d*)]. By this condition and the fact that r - d* > 0, the 
left side of (5.11.6) is positive. Also, n < \I m \ + w, so by (5.11.1) we have n < 3w + 1 + mw. If we combine 
the latter inequality with (5.11.6) we get 


n < 3vv + 1 + w ■ 


w(12d* -4r+ 1) + 4d* - 2r - 1 
4 w(t - d*) - 2(1 - r) 


which is the required bound on n. 


□ 


Note that in the above result, the lower bound that is required on w depends only on r, p*, while the lower 
bound that is required on n depends on r,p* and w. We may now apply Lemma 5.19 in order to establish 
the conditional existence of unhappy nodes of both types. 

Corollary 5.20 (Existence of unhappy nodes). Suppose that t > 0.5, p* < r and w is sufficiently large. 
Then for all sufficiently large n, every state of the process (n, w, r, p) has unhappy (3-nodes, and ifr+p * > 1 
then every state also has unhappy a-nodes. 

Given p, by the law of large numbers with high probability (tending to 1, as n tends to infinity) p* will be 
arbitrarily close to p. Hence we may deduce the absence of dormant states (with high probability) in the 
case that r + p > 1. 

Corollary 5.21 (Absence of dormant states). If p < 0.5 < r and t + p > 1 then with high probability none 
of the accessible states of the process (n, w, r,p) is dormant. 

This corollary along with Proposition 5.18 establishes the main dichotomy in the analysis of the process. 


5.12 Accessibility of complete segregation or dormant state 

A central part of our analysis is the fact that from any state there is a transition to either a dormant state 
or complete segregation. This is what we prove in this section. This also means that the only absorbing 
states of the process are the dormant states. If r < 0.5 then it is clear that the only absorbing states of the 
process are the dormant states, since unhappy pairs of nodes of different type can always swap. It is also 
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not difficult to find an appropriate Lyapunov function, establishing that a dormant state must eventually be 
reached. Consider the mixing index which is non-negative and strictly decreasing in stages for r < 0.5. 
For the case where r > 0.5 more effort is required. We argue in four steps. The numbers in what follows 
arc fairly arbitrary. First we show that from a state with few unhappy nodes of one type (here 5iv 4 is a 
convenient upper bound of what we mean by ‘few’, which is by no means optimal) there is a series of 
transitions which lead to either a state with a contiguous block of length 2 w or a dormant state. Second, 
(assuming that r > 0.5) from a state with a contiguous block of length > 2iv there is a series of transitions 
to complete segregation or to a dormant state. Third, any state which has at least 2w 4 unhappy nodes of 
each type, there is a series of transitions to a state with a contiguous block of length at least w, and at least 
w 4 unhappy nodes of each type. Finally (if r > 0.5) from a state that has a contiguous block of length > w 
and at least 4 w unhappy nodes of opposite type from the block, there is a series of transitions to a state 
with a contiguous block of length > 2w. The combination of these four statements constitutes a strategy for 
arriving at a dormant state or a state of complete segregation, from any given state. 

In the following arguments we will often make use of the following two rather simple facts that hold when 
r > 0.5. One is that (if w > (1 - r)/(2r- 1)), any /3-node that is adjacent to a happy a-node is unhappy. The 
second concerns the situation where next to a happy a-node there is a /3-node, and we swap the /3-node for 
another a-node. Then, provided that before the swap the the second a node is outside the neighbourhood 
of the /3-node, both a-nodes will be happy after the swap. 

Lemma 5.22 (Shortage of unhappy nodes). Suppose that r > 0.5 and that 0 w <sr n. From a state with 
less than 5w 4 unhappy nodes of one of the types, there is a series of transitions to either a dormant state or 
to a state containing a contiguous block of length at least 2w. 

Proof. Without loss of generality suppose that the state has less than 5 w 4 unhappy a-nodes. Since p t e 
(0,1), and 0 <sc w <sc n, if there does not already exist a contiguous block of length 2w then there exists an 
interval [n, v] of 2w nodes which contains at least one a-node and such that any unhappy a-node is distance 
at least 2 w 2 from any node in [n, v]. Any unhappy a node which cannot see any node in [u, v] can move to 
any position in [u, v ] that is adjacent to an a-node (because by doing so, it becomes happy and because if 
a swap is legal for one member of a potential swapping pair then it is legal for both). Hence we can start 
successively replacing the /3-nodes in [u, v] which are adjacent to a-nodes, with unhappy a-nodes, each 
time choosing unhappy a-nodes that have maximal distance from u, v. Note that this recursive procedure 
is valid because all a-nodes in [u, v ] are happy after each swap. Ultimately we either run out of unhappy 
a-nodes, or else [u, v ] becomes an a-block. □ 

Lemma 5.23 (Toward a block of length w). Suppose t > 0.5. IfO w <sc n then from any state which 
has at least 2 w 4 unhappy nodes of each type, there is a series of transitions to a state with an a-block or 
/3 -block of length at least w, and at least w 4 many unhappy nodes of each type. 

Proof. Suppose that we arc given a certain state of the process. Define a sequence u,. i < w 2 of a- 
nodes with neighbourhoods N u . respectively, by induction as follows. Let uq be the least a-node whose 
neighbourhood contains the minimum number of a-nodes amongst all neighbourhoods of a-nodes. If u, 
is defined and i < iv 2 , define n,-+i to be the least a-node whose neighbourhood is disjoint from U j<iN Ui 
and whose neighbourhood contains the minimum number of a-nodes amongst all a-nodes with the same 
property (i.e. with neighbourhoods that arc disjoint from U j<iN Uj ). This completes the definition of (up, 
which is sound provided that n is sufficiently large. We define a sequence v,-, i < w 2 of /3-nodes with 
neighbourhoods N Vj respectively, in a way entirely analogous to the above definition, ensuring also that all 
neighbourhoods N Ui and N Vj arc disjoint. 
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The sequences (uf) and (v,) provide a pool of nodes which will be used for legitimate swaps in a series of 
transitions which will lead to the desired state of the process. We start by considering an interval J of nodes 
of length 3 w which is disjoint from U j< w iN Ui and disjoint from k)j< w 2 N Vr Such an interval exists, provided 
that n is sufficiently large. Let I consist of the vv-many nodes in J that are at distance at least w + I from 
any node outside the interval. Clearly any swap that occurs between a node in I and one of the nodes u,, 
does not affect the composition of the neighbourhoods N Uj for j + i, or N Vj for j < w 2 (and similarly for a 
swap between a node in I and one of the v,). 

Let tj, i < w be the nodes of I enumerated from left to right. We shall describe a swapping process, involving 
less than w 2 swaps. At the end of this process of legal swaps, all nodes in I will be of the same type, (but 
which type that is will not be determined until the end of the process). This process has iv-many steps, with 
each step s involving up to s swaps. Let y s be the type of t s at the end of stage .v. Also, let V s contain the 
nodes u/, v,-, i < w 2 which are of type y s and have not been involved in a swap by the end of stage s. The 
construction is designed so that y s is the type of all q, i < .v st the end of stage s. This feature guarantees that 
at the end of the process, all nodes in I have the same type. Stage 0 is null (i.e. we carry out no instructions 
at stage 0). 

At stage s + 1 we check if t s+ \ has type y s . If so, then we go to the next stage. If not, then suppose first that 
L+i is unhappy. In the case that t s is happy, any unhappy y A -nodc outside J can swap with t s+ i (because an 
unhappy y v -nodc moving next to a happy y v -nodc cannot decrease its utility). In the case that t s is unhappy, 
we claim that any node x from V s can legitimately swap with t s + 1 . In order to see this, note that the number 
of y v -nodes in the neighbourhood of t s is at least as large as this number at the beginning of the process. 
By the definition of V s , this number is at least as large as the number of y s nodes in the neighbourhood of 
x. This means that if x moves to the place that t s+ 1 occupies, its utility will not decrease. 

The last case in the procedure is if t s+ 1 is happy and of type different than y v . In this case we define 
y i+ i e { a, (3) - !y. v } and swap all t h i < s with distinct nodes in V s+ 1 , starting with t s and moving to the 
left. These are legitimate swaps, as nodes of type y. v+ i move next to happy nodes of the same type (so their 
utility is not decreased after the swap). This concludes the description of the process. 

By the end of stage w - 1, all nodes in I are of the same type. Since we perform less than w 2 many swaps, 
there arc less than 2(2w + I )w 2 many nodes whose neighbourhoods arc affected by these swaps. Since w is 
large, there arc therefore at least w 4 many unhappy nodes remaining of each type remaining. □ 

Lemma 5.24 (Toward a contiguous block of length 2 w). Suppose that r > 0.5 and 0 <sc w <sc n. From a 
state that has an a-block of length > w and at least w 4 unhappy nodes of each type, there is a series of 
transitions to a state with an a-block of length > 2vv. The same holds for (3-blocks. 

Proof. Consider the given state and assume that there is no a-block of length > 2w (otherwise 0 transitions 
suffice). Let [x, y | be the longest a-block in the given state, and let J consist of all the nodes that are at 
distance at least w from the interval [y - 2w,y\. 

Note that x - 1 is a /i-node and since r > 0.5 it is unhappy. Let z be the rightmost a-node to the left of x. 
If z is unhappy, then we may swap it with x - 1 since its utility will not decrease. Otherwise, if z is happy, 
then it is at a distance at most w from x and we may successively swap the /3-nodes in (z, x), starting from 
z + 1 and moving to the right, for an equal number of unhappy a-nodes in J. This is possible because each 
time that we move an a node next to a happy a-node, the new a node becomes happy. 

We repeat this process until an a-block of length 2 w has been formed. Each step of the process increases 
the length of the a-block that is adjacent and to the left of y. Therefore the process will terminate. We also 
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perform at most w many swaps, meaning that we shall not run out of unhappy nodes to perform the swaps 
with. □ 

Lemma 5.25 (Complete segregation or dormant state from long block). Suppose r > 0.5 and that 0 <§: 
w n. From a state with a contiguous block of length > 2 w there is a series of transitions to complete 
segregation or to a dormant state. 

Proof. Consider any state which is not completely segregated, but which has a contiguous block of length 
at least 2 w. Without loss of generality, suppose that this is a block of a nodes occupying the interval [u, v], 
where this interval is chosen to be of maximum possible length. Our aim is to show that from this state, 
one may legally reach another with a contiguous block of greater length (or else a dormant state). Now 
if the nodes u and v are both happy then the length of the interval ensures that all nodes in the block are 
happy - this follows by induction on the distance from the edge of the interval by considering the difference 
between successive neighbourhoods. In this case, if there exists an unhappy a node u', then let t e {u, v} be 
distance at least w + 1 from u'. Then u' and the p neighbour of t may legally be swapped, increasing the 
length of the run by at least 1. 

So suppose instead that at least one of the nodes u and v is not happy, and without loss of generality suppose 
that u has bias less than or equal to v, where the bias of a node is the number of n-nodes minus the number 
of /3-nodes in its neighbourhood. Then u and v + 1 may legally be swapped. Performing this swap causes 
position v + 1 to have at least the same bias as v did before the swap, and causes u + 1 to have at most the 
same bias as u did before the swap. Thus, the swap has the effect of shifting the run one position to the 
right and may be repeated until the length of the run is increased by at least 1, i.e. for successive i > 0 we 
can swap the nodes u + i and v + i + 1, so long as the latter is of type /?. The first stage at which the latter 
is of type a the length of the contiguous block has been increased. Putting these observations together, we 
conclude that from any state which has a contiguous block of length at least 2 w it is possible to reach full 
segregation. □ 

Finally, we piece together the above processes in order to show the following comprehensive statement. 

Corollary 5.26 (Complete segregation or dormant state). From any state of the process ( n,w,r,p ) with 
0 <sc w n there exists a series of transitions to complete segregation or to a dormant state. 

Proof. The case r < 0.5, we considered earlier. Suppose that r > 0.5. We may assume that p* e (0,1), 
because otherwise every state is a dormant state. If there exist at most 5 iv 4 unhappy nodes of each type 
in the state, Lemma 5.22 shows how to reach a dormant state or a state with a contiguous block of length 
> 2iv. In the latter case, Lemma 5.25 shows that there is a series of transitions to complete segregation or 
to a dormant state. 

So we may assume that the given state has more than 5vv 4 unhappy nodes of each type. Then Lemma 5.23 
shows how to reach a state with a contiguous block of length > w and at least w 4 many unhappy nodes of 
each type. Furthermore, from such a state Lemma 5.24 shows how to reach a dormant state or a state with 
a contiguous block of length > 2iv. In the latter case, Lemma 5.25 shows that there is a series of transitions 
to complete segregation or to a dormant state. This is an exhaustive analysis that establishes a path to a 
dormant state or complete segregation, from every state. □ 

This completes our proof of Theorem 1.2 for the case that r + p > 1. 
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5.13 Persistent blocks and unhappy nodes in intervals 

Now we focus on the case where r > 0.5 and r + p < 1. Having established that a low number of unhappy 
nodes suffices to ensure dormant states are inaccessible, we now wish to show that such state is reached, 
before any dormant state is reached. Since in this case there are always unhappy /j-nodes, we are only 
concerned about the existence of unhappy a-nodes. One way to ensure this is to establish the existence of 
blocks of /i-nodes of length > w. 

Lemma 5.27 (Persistent /j-block). Consider the process ( n,w,T,p ) with r > 0.5 and let s* be the least 
stage where the ratio between the very unhappy (3-nodes and the unhappy a-nodes becomes less than Aw 2 
(putting s* = oo if no such stage exists). Then with high probability there is a (3-block of length > 2w at all 
stages < s * of the process. 

Proof. Let e > 0, let 6 = 2w/(2w + 1) - w/(w + 1), and let y be a sufficiently large integer so that 

e -2(y-2w)6 2 /wi (1 _ e -26 2 ^ < f/2 . (5. ]3.1) 


Since the initial state is random, as n —> 00 the probability that there is a /j-block of length at least y in the 
initial state tends to 1. Hence (for sufficiently large n) we may assume that there is a /j-block of length > y 
in the initial state, with probability at least 1 - e/2. Fix such a block and note that during the stages it may 
expand or retract. It suffices to show that, conditionally on the existence of such a block in the initial state, 
the probability that it shrinks to a block of length less than 2 w before stage s* is bounded by e/2. Let £ s be 
the length of the block at stage s, so that £q = y. Also let X( } = 0 and for each s > 0 let X s = t s - £ s - \ ■ Then 
X s > -vv for all s, and £ s = (q + J) t <s X s - Let Z s be -w if X s < 0, and let Z v be 1 if X s > 0 (and Z s = 0 if 
X s = 0). Also let Y s = Yjt<s Z, + £ 0 ~~ 2w, so if at stage 5 the length of the /i-block becomes less than 2w, the 
random walk (K,j is ruined (by stage sf). Let (K,j be identical to (K,j, except for stages after s *, at which it 
remains identical to F iSi _i. Hence it suffices to show that the probability that (K,j is ruined is bounded above 
by e/2. Let p s = P[2G > 0 | X s + 0| and let q s = P[X S < 0 | X s ± 0], so that p s + q s = 1. 

Since r > 1/2, as long as the length of the block is at least w, the nearest a-node on each side is unhappy. 
Moreover these a-nodes can swap with any very unhappy /5-node. Any such swap at stage s would make 
Z s = 1. On the other hand, the only way that Z v = -w (i.e. the length of the block is reduced) is that a 
/j-nodc from one of the 2w outer nodes in the block (w on each side) is part of a swap at stage s (with an 
unhappy a-node). Hence according to our hypothesis we have p s (q s > 2w for all s < s*. So 


P[A,. > 0 | X s * 0] > 


2w 

2w + 1 


> 


w 

w + 1 


+ 5 


which means that the walk (T,j meets the requirements of Lemma 5.6. Hence the probability that (Y,) is 
ruined is bounded by the expression on the left-hand-side of (5.13.1). We sum up our argument. Given 
e > 0, we start with a block of /j-nodes of length y, with probability at least 1 - e/2. Conditionally on this 
starting assumption, our argument says that with probability at least 1 - e/2 this block will continue to have 
length more than 2iv at all stages up to stage s *. Hence the probability that there is no /j-block of length 
> 2w at some stage < is less than e. □ 


Another tool that was used in our analysis is a bound on the number of unhappy /i-nodes in the infected 
area, in terms of the number of a-nodes in the infected area. This is based on the fact that, when the number 
of a-nodes in an interval is limited, then the number of unhappy /i-nodes in the same interval is also limited. 
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Figure 17: Partition of patched block B in the proof of Lemma 5.28 


Lemma 5.28 (Proportions in a block of nodes). Consider a block of adjacent nodes which contains exactly 
x nodes of type a. Then for each y e (0,1) there are at most x/y + 2w many / 3-nodes in the block for which 
the proportion of a-nodes in their neighbourhood is at least y. 

Proof. We are given a block of adjacent nodes A. Let us call a node weak if it is a /3-node for which the 
proportion of a-nodes in its neighbourhood is less than y. It suffices to show that the number of /3-nodes in 
A which are not weak is at most x/y + 2 w. If we remove all of the weak nodes from A, thus obtaining a 
possibly different (and shorter) block B , then in the resulting configuration there are no weak nodes. It then 
suffices to show that the number of /3-nodes in B is at most x/y + 2w. Note that the number of a-nodes in B 
remains x, since we did not remove any a-nodes. Let bo < b\ be the endpoints of block B and define a finite 
sequence (/3 ; ) of /3-nodes as illustrated in Figure 17 and formally defined as follows. Let /3o be the leftmost 
/3-node in B such that the left endpoint of its neighbourhood is > bo and such that the neighbourhood of /3o 
is entirely contained in B (if there exists such). Assuming that /3, is defined and there are /3-nodes between 
the right endpoint of /3, and b\, define /3,+i to be the leftmost /3-node in B which is to the right of /3,. whose 
neighbourhood is disjoint from that of /3, and entirely contained in B. Let /?,-, i < k be the sequence defined 
in this way. Then /3,, i < k have disjoint neighbourhoods, each of them containing at least y(2w +1) nodes of 
type a. Hence ky(2w + \ ) < x so k(2w + 1) < x/y which means that the number of nodes that are contained 
in the union of these neighbourhoods is bounded by x/y. Since these are neighbourhoods of /3-nodes that 
are not weak, the number of /3-nodes that are contained in the union of these neighbourhoods is at most 
x/y( 1 - >’) = x/y - x. 

Let Xi be the distance between the right endpoint of the neighbourhood of /3,- and the left endpoint of the 
neighbourhood of /3,+i. Note that for each i < k there is a block of at least x; nodes of type a in the the 
left semi-neighbourhood of /?,-+\. Indeed, according to the definition of (fij), the only reason why there is 
some distance d between the two endpoints is that a block of a-nodes of length d immediately to the left 
of /3 ;+ 1 - We may conclude that there are at least Xi A nodes of type a. By the hypothesis the a-nodes are 
exactly x, so Xi x i - x - Hence the number of /3-nodes in B that do not belong to the neighbourhood of some 
/3,, i < k is at most x + 2w (where 2w is an upper bound for the number of /3-nodes in the final segment of 
B to the right of the neighbourhood of /3r_i> or the whole of B if k = 0). Hence, overall, there are at most 
(x/y - x) + (x + 2w) = x/y + 2w nodes of type /3 in B. which concludes the proof. □ 

A /3-node is unhappy if and only if the proportion of a-nodes in its neighbourhood is more than 1 - r. 
Hence we may apply Lemma 5.28 with y equal to a value that is slightly larger than 1 - r (taking the limit 
y —> 1 - t from above and taking into account that the number of nodes are integers) gives the following 
bound on the number of unhappy /3-nodes in a block. 

Corollary 5.29 (Unhappy /3-nodes versus a-nodes). Consider a block of adjacent nodes of type a or/3 such 
that exactly x of these nodes are of type a. Then there are at most x/( I - r) + 2w unhappy /3-nodes in this 
block. 

By applying this fact to each of the infected segments of the process, and adding up the numbers unhappy 
nodes in each of the segments we see that Y v < Z, s /(\ - r) + 2wC, which is the fact used in the main part 
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of our analysis. 


5.14 Infected area and random variables 

In this case of unbalanced happiness (i.e. when r > 0.5 and r + p < 1, see Table 4) the unhappy a-nodes 
arc initially very rare, so the interesting activity (namely a-to-fi swaps) occurs in small intervals of the 
entire population (at least in the early stages). These intervals contain the unhappy a-nodes, and gradually 
expand, while outside these intervals all /i-nodes are very unhappy. Figure 11 (produced from a simulation) 
shows the development of this process, where the height of the nodes (perpendicular lines) is proportional 
to the number of a-nodes in their neighborhood and the horizontal black line denotes the threshold where 
an a-node becomes unhappy. These cascades that spread the unhappy a-nodes are due to the following 
domino effect. An unhappy a-node moves out of a neighbourhood, thus reducing the number of a-nodes 
in that interval. This in turn often makes another a-node in the interval unhappy, which can move out at a 
latter stage, thus causing another a-node nearby to be unhappy, and so on. The expanding intervals are the 
infected segments which start their life as incubators. 

Definition 5.30 (Incubators). Consider the set I of nodes in the initial state which belong to an interval 
of nodes of length w with less than e* = w( I — p + r)/2 many a-nodes. Let I* be the set of nodes whose 
neighborhood contains a node in I. An incubator is a maximal interval of nodes that is entirely contained 
in I*. 

An interval of nodes is called active at a certain state if it contains an unhappy a-node. The infected area 
is the area that incubators generate by making additional a-nodes unhappy. It is always expanding, and is 
defined formally as follows. 

Definition 5.31 (Infected segments). Let 1 be an incubator. At stage 0 the infected segment Iq corresponding 
to I is 1 itself. At the end of stage 5+1, we first consider those I s which were active at the end of stage 
s and which are still active. We consider these active infected segments in turn, starting at position 0 and 
moving clockwise. For each such I s we let I s +\ = I s U J, where J consists of the nodes which do not already 
belong to another active infected segment (by the time we consider I s ) and whose neighborhood contains 
an unhappy a-node in I s at stage 5+1. We then consider the remaining I s (i.e. those which are no longer 
active): for each such we define I s+ \ = I s — Q where Q consists of the nodes in I s which now belong to an 
active infected segment. 

The infected area is the union of the infected segments. The fresh infected segment corresponding to in¬ 
fected segment I is 7-/o, i.e. consists of the nodes I except the nodes in its incubator. Hence a fresh infected 
segment consists of two growing intervals of nodes. The fresh infected area is the infected area except the 
nodes in the incubators. The interior of a set of nodes J consists of those nodes whose neighbourhood is 
entirely contained in J. The boundary of J consists of the nodes in J which are not in the interior. It is not 
hard to show that if r+p < 1, the probability that a node belongs to an incubator is e~ &(w \ Hence with high 
probability the number of incubators as well as the number of nodes belonging to incubators of the process 
(n, w, r,p) is ne~® (w \ 

Our goal is now to show that the number of unhappy a-nodes remains suitably bounded throughout a 
significant part of the process. Formally, the main idea is to bound this number with a martingale. Intuitively 
though, why should the number of unhappy a-nodes remain fairly small? At the start of the process the 
infected area is a very small proportion of the entire ring. The vast majority of unhappy /i-nodes occur 
outside the infected area, while all unhappy a-nodes are inside the infected area. It follows that with high 
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probability a swap will involve an a-node in the infected area and a /3-node outside the infected area. A 
bogus swap is a swap is one that is not of this kind. 

Definition 5.32 (Bogus swaps). A swap which involves a /3-node currently inside the infected area is called 
bogus. Given an infected segment I, a bogus swap in I is a swap that moves an a-node into I. 

In the absence of bogus swaps, it is not hard to show that the or- nodes in the infected area (except those in 
the incubators) are unhappy. This in turn can be used in order to show that the a-nodes in the infected area 
(and so, the unhappy a-nodes too) are likely remain o (n). However there will be bogus swaps, and these 
can make certain a-nodes in the infected area happy. 

Definition 5.33 (Anomalous nodes). A node is called actively anomalous at some stage of the process if it 
is a happy a-node in the interior of the fresh infected area; it is called anomalous if it has been actively 
anomalous in this or a previous stage. Finally a node is called generally anomalous at some stage, if it is 
in the current infected area and has been or will be actively anomalous at some later stage of the process. 

Clearly actively anomalous implies anomalous, which in turn implies generally anomalous (but not the 
other way around). Let D v denote the number of anomalous nodes at stage s, and let D v denote the number 
of generally anomalous nodes at stage s. A martingale argument will be used in order to show that as long 
as D s = o (n), the a-nodes in the infected area are likely to remain o (n). The definition of anomalous nodes 
and D s may seem strange at this point, not least because D v is not predictable at stage s. The reason that we 
introduce D v is that D v is very hard to analyze, and very hard to bound directly via a martingale (adapted to 
the stages of the process). However it is possible to bound D v via a martingale argument of a more general 
type (i.e. which is not adapted to the stages of the process). Since D s < D v , this suffices for our puiposes. 

We define additional global variables in Table 9. By the definitions we have D s < D v+ i and 

(a) U a (j) < Z, (b) U/Av) < G. v + Y, (c) G. v < U*(.v) (d) E [C] = ne~ &M 

Here (d) holds because of the likely total size of the incubators and (c) holds because /3-nodes outside the 
infected area are very unhappy. 


5.15 Probabilities in the infected area and anomalous nodes 

Recall that our current goal is to show that the number of unhappy a-nodes remains suitably bounded 
for a significant part of the process. The basic idea is that if the number of unhappy a-nodes increases 
sufficiently, then the infected area must become quite large, and it becomes very likely that the next swap 
will involve an unhappy a-node in the interior of the infected area. We shall be able to argue that there are 
good chances that the swap is not bogus. This means that this a-node will move outside the infected area 
and will become happy. The anomalous nodes, however, present a difficulty with this line of argument. The 
eviction of the a-node from the infected area (and its replacement by a /3-node) may produce more unhappy 
a-nodes in its neighbourhood. So it is not absolutely true that the total number of unhappy a-nodes will 
decrease. In fact, as the simulations of Figure 5 suggest, at the early stages of the process this number is 
likely to increase slightly. 

If we assume the absence of bogus swaps, then it is not hard to show that the nodes in the interior of 
the infected area and outside the incubators have neighborhoods with proportion of a-nodes well below 
(2 w + l)r. In this case it is straightforward to employ a martingale argument which shows that the number 
of a-nodes in the infected area (hence also the total number of unhappy a-nodes) remains bounded with 
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high probability throughout the process. Indeed, in this case there will be no happy a-nodes in the interior 
of the fresh infected area, so (according to the argument we outlined above) the likely swap absolutely 
reduces the total number of unhappy a-nodes. 

In the presence of bogus swaps, we will use a more sophisticated martingale argument to bound the anoma¬ 
lous nodes. This can be used by another simpler martingale argument, in order to bound the number of 
unhappy a-nodes, at least up to some stopping time of the process and with high probability. This plan 
requires the calculation of certain probabilities. 

Lemma 5.34 (Probability of a bogus swap). At each stage 5 + 1, the probability that the current swap will 
be bogus is bounded above by Y s /G s . 


Proof. The number of pairs which can cause a bogus swap is bounded by U^fs) • Y s . On the other hand, 
any unhappy a-node can swap with a /3-node outside the infected area. Indeed, this is because the number 
of a-nodes in the neighbourhood of any /3-node outside the infected area is at least (2 w + l)r. Hence there 
arc at least U„(.v) • G v pairs of nodes that can swap at stage 5+1. We can conclude that the probability of a 
bogus swap is bounded by U a (5)Yj/U (} (5)G s = Y s /G s . □ 


The calculation of the following probabilities is a first step towards our martingale argument. 
Lemma 5.35 (Probabilities for Z y ). The numbers 


Gi Z y - D y - 2vv • C 

Uff(5) G. s + Y s 


and 


2w ■ C ■ 


G y + Y s 

G s ■ U„(5) 


are a lower bound for the probability that Z. s+ i < Z v and an upper bound for the probability that Zs+ 1 > z„ 
respectively. 


Proof. The probability that Z v+ i < Z s is at least as much as the probability that the swap is not bogus and 
it involves a node in the interior of the infected area at stage s + 1. Indeed, in this case the swap moves an 
a-node from the interior of the infected area to outside the infected area, so Z v+ i = Z. y - 1, because the 
length of the infected area remains the same. The unhappy a-nodes of the infected area that cannot be part 
of such a swap arc the ones that belong to the boundary of the infected area, so they arc at most 2ivC many. 
This means that there arc at least Z v - D y - 2ivC nodes of type a which can be picked as paid of a swapping 
pair at stage 5 + 1 such that Z. y+ i - Z v is negative. Note that each of these a-nodes forms a swapping pair 
with any /3-node outside the infected area, since all such /3-nodes arc very unhappy. Therefore there arc at 
least (Z. y - D v - 2ivC) • G y many swapping pairs which make Z y+ | - Z v negative. On the other hand, the 
total number of swapping pairs arc at most (G. y + Y y ) • U„(.y) many. Hence 

T s - D.y - 2m ■ C 

(G iV + Y^) • U ff (5) 

is a lower bound for the probability that Z s +\ < Z v . 

For the second clause, note that Z y+ | > Z y can only happen in the case that the infected area expands at 
stage 5 + 1. This can only occur if the swapping pair involves an a-node that belongs to the boundary of the 
infected area of stage 5. There arc at most 2ivC such nodes so there are at most 2ivC • (G. y + Y y ) swapping 
pairs that can cause Z s+ i < Z y . Moreover there arc at least G. y • U„( 5 ) possible swapping pairs for stage 
5+1. Hence 

0 C • (G, + Y y ) 

2 w - - 

G s ■ U„ (5) 

is an upper bound for the probability that Z v+ i > Z y . □ 
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We may now identify our first supermartingale. Note that the following fact is the reason why we defined 
the anomalous nodes the way we did. The fact that D v is nondecreasing is a necessary part of the following 
proof. 

Lemma 5.36 (Non-anomalous nodes in an infected segment). The process Z* := max{Z v - D y , llw 2 • C! 
is a supermartingale, for all s < T y . 

Proof. At the end of stage s (and given all information as to how the process has unfolded so far) denote 
the probability that Z s+ i < Z v by q and the probability that Z v+ i > Z v by p. Let E be the expected value of 
Z s+ \. Now at stage 5+1 the infected area can expand by at most w nodes. Moreover, it is not possible that 
at stage 5 + 1, an a-node which is not in the infected area of stage s is moved to a position in the infected 
area of stage 5 + 1. This is because all a-nodes outside the infected area of stage s are happy at stage s. It 
follows that Z s+ i - Z s < iv at each stage s. Therefore 


E < p ■ (Z s + w) + q ■ (Z s - 1) + (1 - p - q) ■ Z s - Z s + wp-q. (5.15.1) 

By Lemma 5.35, in order to ensure that wp - q < 0, it suffices that 


„ 2 ^ G v + Y S G, Z y -D y -2w 2 -C 
2 w“ • C • —--— < —— • —--- 

G.v • U ff (5) U ff (5) Gi + Y s 


so 


Z s > D s + 2w 2 • C • 


1 + 



Since 5 < T y the expression inside the parentheses in the latter inequality is bounded above by 2. Hence for 
the condition wp - q < 0 it is sufficient that Z, > D y + 1 Ow 2 • C for all s < T y . So now we divide into two 
cases. If Z s < D y + 10w 2 • C then Z* +1 = Z* = llw 2 • C. Otherwise, E < Z s and the result follows from 
the fact that D y is non-decreasing. □ 


Now to get from Z* to Z v , we need to bound D y . Intuitively, we expect the proportion of the a-nodes in 
neighborhoods of nodes in the interior of the infected area to be rather low, e.g. considerably lower than 
the threshold (2w + l)r. The following lemma gives a justification for such an expectation and is also 
the reason why we chose e t = (1 - r - p)/2 in the definition of incubators, Definition 5.30. Here is an 
intuitive explanation of this fact. Let us say that a node in the infected area which is not in the interior of 
the infected area is in the boundary of the infected area. A node in the boundary of the infected area can 
see a node outside the infected area. The nodes in the complement of the infected area have never seen 
unhappy a-nodes, hence the proportion of a-nodes in their semi-neighbourhoods can only increase. This 
means that one of the semi-neighbourhoods of each node in the boundary of the infected area has not been 
affected by a-to-/? swaps. The following lemma says that such a node can only be included in the interior 
of the infected area if the semi-neighbourhood of it which has been affected by a-to-/) swaps, is affected by 
at least e, w many such swaps. In other words, the expansion of the infected area requires a considerable 
number of stages. The particular statement refers to the case where the infection travels from right to left. 
By symmetry, an analogous statement holds for the case where the infection travels the opposite direction. 

Lemma 5.37 (Concentration of a- to-/? swaps). Let [ a,d] be an interval of nodes in the initial state of the 
process, and 6 > 0 such that for each u € \a, d\ the proportion of a-nodes in each semi-neighbourhood ofu 
is at least t + S. Consider a time intetyal of the process where there have been no a-to-fi swaps in [a - w, a). 
For each u € \a, d\ and any stage in this intetyal, if there is an unhappy a-node in [ a,u ] then there have 
been at least 2 w6 many a-to-fS swaps in the right semi-neighbourhood ofu by that stage. 


Proof. Let 5 be a stage of the process and suppose that there have been no a-to-/3 swaps in [a-w, a) by stage 
s. Suppose that there is an unhappy a-node in [a - w, u\ at stage s. Then there must have been an unhappy 
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a-node in \u - w, u\ at some stage < 5 . Consider the first such stage to and let v<) be the rightmost a-node 
in (u - w, u] which became unhappy at stage to- By our hypothesis, up to stage to there has been no a-to-/3 
swaps in (vo, u\. Hence all of the a-to-fi swaps that occurred in the right semi-neighbourhood of vo are also 
in the right semi-neighbourhood of u. The proportion of the a-nodes in the left semi-neighbourhood of vo 
is more than r + 6. Since vo is unhappy at to, the proportion of the a-nodes in its neighbourhood is less than 
t. Hence the proportion of the a-nodes in its right semi-neighbourhood is at most t-8 at stage t. Hence by 
hypothesis, by stage t at least 2 w8 many a-to-j3 swaps have occurred in the right semi-neighbourhood of v. 
By the above discussion, these swaps have also occurred in the right semi-neighbourhood of u. □ 

According to the definition of incubators, this fact is relevant for 6 = (1 -r-p)/2 and shows that the infected 
area expands reasonably slowly in the stages of the process ( 11 , w, r,p). Indeed, the proportion of a-nodes in 
the neighbourhood of any node outside the infected area at any particular stage is at least r + (1 - r - p)/2. 
This also shows that, in the absence of bogus swaps, all a-nodes in the interior of the fresh infected area 
are always unhappy (i.e. there are no anomalous nodes). In the presence of bogus swaps this is no longer 
true, and this is why we have to work in order to bound the spread of anomalous nodes. 


5.16 Bounding the anomalous nodes 

Recall that D v denotes the number of anomalous nodes at stage s. In this section we construct a martingale 
process which shows that D v is likely to be bounded appropriately, throughout a significant part of the 
Schelling process. This argument requires us to consider the random variables localized into the individual 
infected segments. Recall the stopping times defined in the second part of Table 3. We use rpn I {Aw) rather 
than rpn/(3w) in the definition of T g so as to allow for the slight discrepancy which one might expect 
between p and p*. 

Definition 5.38 (Stopping times). Let T g be the least stage such that Gt < T(>n/(4w). Define T y to be 
the first stage which is either T,, or else such that Y s > G v . Finally let T mlK be the first stage for which 
mix < n(w + l)rp*. In all cases, if the stage described does not exist then we define the corresponding 
stopping time to be 00 . 

Given an infected segment /, let D s = D S (I) be the number of nodes in I s that will ever become anomalous, 
up to stage T g . This is a version of the generally anomalous nodes D ( . A stage is called an /-stage if a swap 
occurs involving a node from I. 

If (v(sj) is an enumeration of the /-stages, let If = D v{n p } and I* = I v ( sw sy 

We use * as a superscript in other variables in the following, in order to indicate that they are ‘jump 
processes’ in the sense that they are not updated at every stage or even every I- stage of the Schelling 
process. For example, D* is only updated every w 5 many /-stages of the Schelling process. 

Recall that we may view the underlying probability space Q as a tree, where the nodes are states and 
branchings correspond to state transitions. Let Q. A T g denote the subspace restricted to the stages up to 
time Tg (which may be infinite). Normally we would say that an event IR c Q a T g is /-independent if 
it did not impose any branching restrictions regarding the /-stages that occur in the reals in it. We give a 
sightly more general definition which is more appropriate for the argument to follow. An event c Q a T g 
is called /-independent if for each fi e and any s such that the transition from /j f v to fi |\ v+ i occurs at 
an /-stage, [3 \ s *S e for every state that is obtained from /j \ s through a non-bogus swap. A filtration 
c lR s+ i c f) a Tg is called /-independent if for each s the event j/f A is /-independent. Analogously, a 
process (J v ) on Q is called called /-independent if the natural filtration of it is /-independent. Intuitively, a 
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process (,L) on the underlying probability space Q is /-independent, if for each s, fixing the value of J s does 
not impose any restriction on (i.e. is compatible with all) the transitions of the Schelling process from stage 
.s' to stage .v + 1, that involve a non-bogus swap and a node from /. Here we use boldface font for J v because 
this process will typically by global, in the sense that it involves information about the process that is not 
restricted in the infected segment /. In the following lemma we use (J*) for the underlying /-independent 
global process in order to indicate that it refers to the subsequence of stages sw 5 of the process, much like 
D* 

.V 

Lemma 5.39 (/-supermartingale). Given an infected interval /, the process D* - I Ouw is a supermartingale 
relative to any I-independentprocess J* to which D* is adapted. 

Proof. Given an /-independent process J* such that D* is adapted to J* (i.e. D* is a function of J*) it 
suffices to show that E [/)* | /,_ij < />*_, + 10w for all 5 . Let D*° be the number of nodes in the left semi¬ 
interval of I s that will ever become anomalous, up to stage T g . Similarly let D* 1 be the number of nodes in 
the right semi-interval of I s that will ever become anomalous, up to stage T g . Clearly D* = D*° + D* ] . So 
it suffices to show that 

E [D*‘ j J*_j] < D*J + 5w for each i = 0,1. 

Similarly, let I*° be the left interval of the fresh part of /* and let I* 1 be the right interval of the fresh part of 
I*. Fix i — 0,1 and set H*! - I*' - l* ! { . In order to bound the expectation of D* 1 , we consider the following 
cases (where each case applies only if the one above it fails): 

(a) \H*j\ < 4w; 

(b) There are bogus swaps in the /-stages (s - l)w 5 to .vw 5 ; 

(c) A happy a-node appears in the interior of J\ before the interior becomes all //-nodes; 

(d) The above //-firewall forms, but it shrinks by 4vv at some later I -stage tw 5 . 

(e) Otherwise. 

We will show that all of these events yield small expectation (conditional on J s ) on the number of happy 
a-nodes that will ever appear in the interval H* 1 after I- stage sw 5 of the original process (in particular, the 
probabilities of (b)-(d) are very small). We decide to accept 4w happy a nodes in //' as a desirable (i.e. not 
too high) count. So, irrespective of likelihood, event (a) is desirable. Note that by Lemma 5.37, 

in w 5 many /-stages I cannot grow by more than 2w 5 /(I - r - p). (5.16.1) 

Note that Lemma 5.34 also holds locally, by the same proof. In other words, given an interval of nodes 
of length (, then the probability that at stage s + 1 a bogus swap will occur involving a //-node from the 
given interval is bounded above by f/G s . Since all stages are bounded by T g , it follows that the probability 
(conditional on J*_j) of a bogus swap in an area of length ( is less than 4 wf/nrp. Hence by (5.16.1), 
event (b) has probability (conditional on J*_j) upper bounded by w 2 ' 5+2 /n. In this case we can bound the 
expectation trivially by w 3 ' 5+2 /n = w 17 /n. 

Now suppose that (a), (b) do not occur so that, by Lemma 5.37, each subinterval of length w in the interior 
of H*’ has a-proportion at most r - e* at I -stage sw 5 , where recall that (e* = 1 - r - p)/2. In particular, all 
a nodes in the interior of H*' are unhappy, and remain so unless w5 bogus swaps happen in //' . We wish to 
show that in this case 

event (c) has probability (conditional on J*_j) upper bounded by (w 3 ' 5 /n) w6 . 
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Indeed couple this process (conditional on J*_j, where each stage is either a non-bogus swap in //*', or 
something else) with a gambler’s ruin process, where the gambler has w 5 /6 chips and the house has w6 
chips, and the ratio of the winning probabilities is less than q = 4w 5+2 /n in favor of the house. Then we can 
estimate an upper bound the probability that w5 bogus swaps occur in ./' before all the interior turns into a 
/j-firewall. According to the standard gambler’s ruin result, this is 


1 - q w5/s 

q-wS _ qw s /6 


<-^< ( w 3 - 5 /n) wS 
q -w6 


which is also a bound on the (conditional) probability of event (c). Now assume that (a)-(c) do not occur, 
and lets estimate an upper bound for the probability of (d). Again, couple this process (conditionally on 
J*_j) with a biased random walk where a negative move corresponds to a bogus swap moving something 
from the w border (one or the other) of the firewall, and a positive move is swapping the a-node at the edge 
with a //-node (other events are ignored). The ratio of the probabilities is bounded above by 2w/(nrp/4w) 
which is bounded by w 3 /«. Also note that a negative move chips (at most) w away from the firewall, while 
a positive move only contributes (at least) one node to the firewall. Then the probability that it will eat up 
tw at any future time is bounded by w ■ (iv 3 /«)'“ 1 . For t = 4 we get 


event (d) has probability (conditional on J*_j) upper bounded by w 10 /n 3 . 

Then the expectation of the number of anomalous nodes that will ever appear in H*' is bounded by 


4 w + 2- 


. w 


, 2 - 5+2 


w 5 < 4w + 


w 


3 - 5+3 


< 5w. 


Finally under case (e) it is clear that the conditional expectation of D*‘ is also bounded by D*‘_ ] + 5w. 
Considering all the different cases, by the law of alternatives for conditional expectation we have that 
E [A*' [ J*_j] < D* l _ j + 5w, which concludes the proof. □ 

Let Ij, j < t be the infected segments (and / ; [.v] their state at stage ,v). Recall that D s is the sum of all D s {Ij), 
j < t. In order to bound D v we need to prove a global version of Lemma 5.39. An immediate obstacle is 
the asynchrony of the /-stages with respect to the various infected segments 1. We need to find a process 
L^. relative to which D s (or some ‘asynchronous’ version D s of it) is a supermartingale. 

For each j < t let t s (j) be the stage where exactly ,v • vv 5 many Ij- stages have occurred. Also let (r,) be 
a monotone enumeration of the times |r v (/) | j < t,s e N). Let A s (j) be r m (j) for the maximum m such 
that T m {j) < s. Let D v be the sum of all D\ s (j)(Ij), j < t. The point of this definition is that D v considers 
values of D(Ij),j < t at the last stage < ,v where they completed a cycle (which happens at every w 5 many 
Ij- stages) and outputs their sum. Define L iS to be the vector containing the tuples (D^p, A s (j)) for each 
j < t. In this way, the process (D. s ) is adapted to (L v ) (in other words, for each s, the value of D v is a 
function of L v ). Note that D v remains constant in the intervals |r v , r v+ i), just as Ape*) remains constant in 
the interval |r. s (/), t v4 -i (./))- 

Lemma 5.40. The process D r - 20 h’.v is a supermartingale relative to the process L Ti . 


Proof. Using the law of alternatives for conditional expectation, it suffices to show that for each ,s there is 
a (finite) partition J\ of events relative to L T[ such that for each A e IR we have 

Ea [d T j+1 | L Ts ] < D r + 20 w for all x. (5.16.2) 
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Each event A e Jl describes which pair of infected intervals Ij completes a cycle at stage t (+ i , and the 
sequence of Ij- stages (for each of the two j) from Aj(r s ) to t s + i . Formally, event A is a tuple one tuple 
(mo, m\) where m, < t, and for each i = 0,1 an increasing sequence of stages starting from A,„.(r s ) and 
ending on the same number a. If mo = mi then the two sequences should be the same. The meaning of 
A is that r i+ i = a and infected intervals with indices m,- are hit at stage a, with the sequence of stages 
representing the exact stages from A mj (j s ) to a where a swap occurs in I m . By the definition of A, this event 
is -independent for i = 0,1. At stage r v+ ] of the process there must be exactly one tuple (mo, mi) where 
i < t, such that the swap occurred in 7 mo and 7 m ,. For each such event A on L Ti we have 

Ea [D Ts+ i | L Ts j = ^ Ea [D rj+1 (7y) | L Ts j 
j<t 

But for j + mo, mi we have Ea | D Ts+i (Ij) | L Tj J = D Ts (Ij ) and by Lemma 5.39 we have 

Ea [D Ts+1 (I mi ) | Lr s ] < D Ts (I mi ) + 10 w for i = 0,1 

since A is 7 mi -independent for i - 0,1. Therefore (5.16.2) holds for each of the events A. By the law of 
alternatives, and since there can be at most two infected segments that complete a cycle at stage r s+ 1 , we 
get 

E [ D Tj+1 | L Tj ] < D r + 20w for all 5. 

Therefore D Ts - I Oha is a supermartingale adapted to L rj . □ 

Corollary 5.41. Let a e N. With probability >1-1 /a, for all s < T g we have D s < a + + ne~ 0(w \ 

Proof. By Femma 5.40 and the maximal inequality for supermartingales, given any a > 1, with probability 
at least 1 - \/a we have D Tj < a + 20hw for all s < T g . Since each stage can be an 7,-stage for at most two 
distinct j < t, we have |{; | r,- < ij| < 2 s/w 5 . Hence for each a > 1 we have 

20^ 

with probability > 1 - 1/a, D s < a H—— for all 5 < T g . (5.16.3) 

Also, note that at each stage s we have D s (Ij) < D^flj) + vv 5 for each j < t. Hence D v < D, + nw 5 e~° (w \ 
Since we also have D v < D s for all s, the corollary follows from (5.16.3). □ 


5.17 Bounding the arrival time to a safe state 

By Femma 5.36 and Corollary 5.41 we have the desired bound on Z s . 

Corollary 5.42. Let a e N. With probability >1-1 /a, for all s <T y we have Z s < a + ^ + ne~° (w) . 

By Femma 5.34 and Corollaries 5.29 and 5.42 we have the following 

Corollary 5.43. Let a € N. With probability >1-1 /a, for all s < m i n j T y , n } we have w 3 • Z s = o ( n ) and 
p s = o(l). 

The following result is the technical basis for the result that with high probability a safe state will be reached 
(at some finite stage). It says that, with high probability the stopping times T y , T g are equal and are bounded 
by n. 

Lemma 5.44 (Stopping times). With probability 1 - o(l) we have T y = T g < n. 
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Proof. Let e > Osuchthat 1-6 > p+1/8. By Hoeffding’s inequality for Bernoulli trials we may consider n 
large enough such that the probability that Go > (p + l/8)» is less than e/4. Recall that p s is the probability 
of a bogus swap at stage s + 1. Suppose that w is large enough such that with probability at least 1 - e/4 

(a) w 2 C < nrp/32; 

(b) vvZ iS < nrp • (1 - r)/32 for each s < min{r y , n}\ 

(c) p v < e 3 /16 for each s < mini!/,,?!}. 


Clause (a) can be ensured by Lemma 5.17. Clause (b) can be ensured by Corollary 5.42. Clause (c) can be 
ensured by Corollary 5.43. First, for a contradiction, assume that T y < T g . Then Gy < Yj . By Corollary 
5.43 and since (by definition) T y < T g we have 


■ rp/4 < w ■ Gp < w 


Zp, 2 hto{ 1 - r) nrp nrp 

-+ w • -+-<- 

l-r 32 32 16 


which is the required contradiction. Hence with probability > 1 - e/4 we have T y = T g . Second, we show 
that with probability at least 1 - e/2 we have T y < n. By clause (c) above, 

with probability at least 1 - e/4, at all stages 5 < min {T y ,n} we have p v < e 2 /4. (5.17.1) 


By (5.17.1), with probability at least 1 - e/4, the expectation of the number of bogus swaps that have 
occurred by stage T y is < e 2 ■ 7\/4. Hence, conditionally on the event that p s < e 2 /4 for all stages 
s < min(7\., n ), the probability that by stage min!7/ , n \ more than en bogus swaps have occurred is less than 
e/4. Hence the unconditional probability that by stage min {T y ,n} at most eT y bogus swaps have occurred 
is at least (1 - e/4) 2 > 1 - e/2. 


We conclude the argument. We have established that the probability of the event T y < T g or Go > n(p+ 1/8) 
is bounded by e/2. It remains to show that outside this rare event, T y < n. Since every non-bogus swap 
reduces G^ by (at least) 1, and Go < n(p + 1/8), p < 0.5, with probability at least 1 - e/2 we have 

Gp < Go - (1 - e)T y ^T y < (G 0 - Gp)/(1 - e) < n(p + l/8)/(l - e) < n 

which shows that T z = T g < n with probability at least l-e. □ 

Corollary 5.45 (Safe state arrival). Suppose that r+p < 1. Then with high probability the process (n, w, r,p) 
reaches a safe state, and then complete segregation. 


Proof. Let e > 0. By the law of large numbers, with probability at least 1 - e/4 and sufficiently large n we 
have 3p < 4p*. Pick w, n large enough such that 

(a) T g = T y < n with probability > 1 - e/4; 

(b) 2w'Zp/(l - r) < nrpl A with probability > 1 - e/4; 

(c) C(w + 1) < nrpKAw) with probability > 1 - e/4. 

Clause (a) can be ensured by Lemma 5.44 and clause (b) can be ensured by Corollary 5.42. Clause (c) can 
be ensured by Lemma 5.17. By the definition of T g , Gp < Tpn/(4w). Hence by Corollary 5.43 we have 

Up < Gp + Yp + Zp < C(w + 1) + Gp + 2Zp/(l - r) < ^ ~ ^ 

with probability > 1 - e. But mix < U • w{w + 1) so the mixing index at stage T g is less than nrp* • (w + 1). 
In other words, T mix < T g , so by Proposition 5.3 the process at stage T g is in a safe state, with probability 
more than 1-6. Hence by Corollary 5.26, the process will arrive to complete segregation with probability 
at least 1-6. □ 
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